OPTIMAL CONCENTRATION INEQUALITIES FOR DYNAMICAL 

SYSTEMS 



JEAN-RENE CHAZOTTES, SEBASTIEN GOUEZEL 



Abstract. For dynamical systems modeled by a Young tower with exponential tails, we 
prove an exponential concentration inequality for all separately Lipschitz observables of 
n variables. When tails are polynomial, we prove polynomial concentration inequalities. 
Those inequalities are optimal. We give some applications of such inequalities to specific 
systems and specific observables. 
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1. Introduction 

Let X be a metric space. A function K on X n is separately Lipschitz if, for all i, there 
exists a constant Lip^ (K) with 

\K(x , . . . ,Xi-i,Xi,x i+l , . . .,x n -i) — K(x , . . .,x i -. 1 ,x , i ,Xi+i, ■ ■ . ,a;„-i)| < Lip^ (iT)<i(xj, x-), 

for all points xi, . . . , x n , x\ in X. 

Consider a stationary process (Zq, Z\,...) taking values in X. We say that this process 
satisfies an exponential concentration inequality if there exists a constant C such that, for 
any separately Lipschitz function K(xq, . . . ,x n -i), one has 

( L1 ) E / e Jf(Z 0) ...,Z n _i)-E(K(Zo,...,Z n -i))) < e <?E"=o Li Pi(*0 2 . 

One should stress that this inequality is valid for all n (i.e., the constant C does not depend 
on the number of variables one is considering). An important consequence of such an 
inequality is a control on the deviation probabilities: for all t > 0, 



K(Z ,...,Z n - 1 )-E{K(Z ,...,Z n - 1 ))\>t)<2e 4C ^=o . 
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This inequality follows from the inequality P(Y > t) < e A *E(e Ay ) (A > 0) with Y = 
K(Zq, . . . , Z n —\) — K(K(Zq, . . . , then we use inequality (jl.ip and optimize over A 

by taking A = t/(2C E"=o ^(K) 2 ). 

In some cases, it is not reasonable to hope for such an exponential inequality. One says 
that (Zq, Zi, . . . ) satisfies a polynomial concentration inequality with moment Q > 2 if there 
exists a constant C such that, for any separately Lipschitz function K(xq, . . . , x n ~i), one 
has 



(1.2) E[\K(Z , . . . , Z n _i) - K(K(Z , . . . , Z n _i))| <9 ) <C 




0/2 



An important consequence of such an inequality is a control on the deviation probabilities: 
for all t > 0, 

/ i \ 0/2 

/ n— 1 

(1.3) P(|K(Z ,...,Z n _ 1 )-E(K(Z ,...,Z n _ 1 ))| >t) <C£- Q ^Li Pj (K) 5 

\i=o 

The inequality (|1.3p readily follows from (|1.2p and the Markov inequality. However, it is 
weaker in general. We will say that (Zq, Zi, . . . ) satisfies a weak concentration inequality 
if (|l-3p holds for any separately Lipschitz function K. 

For instance, if Zq, Z\, . . . is an i.i.d. process, then it satisfies an exponential concentration 
inequality if Zi is bounded [LedOU Page 68], a polynomial concentration inequality with 
moment Q > 2 if Zi € Lfi [BBLM05J, and a weak concentration inequality if P(|Zj| > 
t) < Ct~Q (while we could not locate a proper reference in the literature, this follows easily 
from classical martingale techniques and a weak Rosenthal-Burkholder inequality - see 
Theorem 16.31 below) . 

Our main goal in this article is to study processes coming from dynamical systems: we 
consider a map T on a metric space X, and an invariant probability measure fj,. Un- 
der suitable assumptions, we wish to show that the process (x, Tx, . . . ) (where x is dis- 
tributed following n) satisfies concentration inequalities. Equivalently, we are interested 
in the concentration properties of the measure fj, n on X n given by dfJ, n (xo, . . . , x n —\) = 
d/i(xo)5 Xl= Tx • • • <5x n _i=Tz n _2- This is not a product measure but, if the map T is suffi- 
ciently mixing, one may expect that T k (x) is more or less independent of x is k is large, 
making the process (x,Tx, . . . ) look like an independent process to some extent. 

Such questions have already been considered in the literature. In particular, [CMS02J 
proves that a (non-necessarily Markov) piecewise uniformly expanding map of the inter- 
val satisfies an exponential concentration inequality. Polynomial concentration inequalities 
(with moment 2, also called Devroye inequalities) have been proved in less expanding situa- 
tions (exponential Young towers - including Henon maps - in [CCS05aj, intermittent map 
with parameter close enough to in [CCRV09J). Our goal is to prove optimal concentration 
inequalities for the same kind of systems. In particular, we will prove that Young towers 
with exponential tails satisfy an exponential concentration inequality, and that in Young 
towers with polynomial tails one can get polynomial concentration with a moment directly 
related to the tails of the return time on the basis of the tower. 

Concentration inequalities are a tool to bound systematically the fluctuations of 'com- 
plicated' observables of the form K(x, Tx, . . . , T n ~~ 1 x). For instance, the function K can 
have a complicated analytic expression or can be implicitly defined (e.g. as an optimization 
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problem). If we are able to get a good estimate of the Lipschitz constants, we can apply the 
concentration inequality we have at our disposal. Various examples of observables have been 
studied in [CMS021 ICCS05b"l ICCRV09] . Since we establish here optimal concentration in- 
equalities, this improves automatically the bounds previously available for these observables. 
We shall state explicitly some of the new results which can be obtained. 

Outline of the article: The proofs we will use for different classes of systems are all 
based on classical martingale arguments. It is enlightening to explain them in the simplest 
possible situation, subshifts of finite type endowed with a Gibbs measure. We will do so 
in Section [2 The following 4 sections are devoted to proofs of concentration inequalities 
in various kinds of dynamical systems with a combinatorial nature, namely Young towers 
with exponential tails in Section [3l with polynomial tails in Section U] (the invertible case is 
explained in Section [5]), and with weak polynomial tails in Section [6l Several applications to 
concrete dynamical systems and to specific observables are described in Section [7J Finally, 
an appendix is devoted to the proof of a particularly technical lemma. 

In this paper, the letter C denotes a constant that can vary from line to line (or even on 
a single line). 

2. Subshifts of finite type 

In this section, we describe a strategy to prove concentration inequalities. It is very 
classical, uses martingales, and was for instance implemented for dynamical systems in 
[CMS02J and for weakly dependent processes in [RioOOj. Our proofs for more complicated 
systems will also rely on this strategy. However, it is enlightening to explain it in the most 
simple situation, subshifts of finite type. 

2.1. Unilateral subshifts of finite type. Let X C S N be the state space of a topologi- 
cally mixing one-sided subshift of finite type, with an invariant Gibbs measure fx, and the 
combinatorial distance d(x,y) = (3 s ( x ' y *> where (3 < 1 is some fixed number and s(x,y) is the 
separation time of x and y, i.e., the minimum number n such that T n x and T n y do not be- 
long to the same element of the Markov partition. Writing x = (xqX\ . . . ) and y = {yoy\ ■ ■ ■), 
then s(x,y) = inf{n : x n ^ y n }. 

Theorem 2.1. The system (X,T,/_i) satisfies an exponential concentration inequality. 

Fix a separately Lipschitz function K(xq, . . . ,x„_i). We consider it as a function on X N 
depending only on the first n coordinates (therefore, we will write Lip^X) = for i > n). 
We endow with the measure p^ limit of the /ijy when N — > oo. On X^, let T p be the 
cr-algebra of events depending only on the coordinates (xj)j> p (this is a decreasing sequence 
of ex-fields). We want to write the function K as a sum of reverse martingale differences 
with respect to this sequence. Therefore, let K p = E(X|J r p ) and D p = K p — K p+ i. The 
function D p is J^-measurable and K(D p \ J~ p +i) = 0. Moreover, K — E(i^) = X^p>o 

The main point of the proof is to get a good bound on D p : 

Lemma 2.2. There exist C > and p < 1 such that, for any p, one has 

v 

\D p \<CY,P P ' 3 ^Pj(K). 

j=0 
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We then use the Hoeffding-Azuma inequality (see e.g. [MS86, Page 33] or [LedOU Page 
68]), saying that for such a sum of martingale increments, 

The Cauchy-Schwarz inequality gives 

2 



v i=0 / \j=0 j \j=0 j j=0 

Summing over p, we get Ylp=o SU P \Dp\ 2 — ^ Lip.,(-fQ 2 - Using the Hoeffding-Azuma 
inequality at a fixed index P, and then letting P tend to infinity, we get E( e E D p) < 
e c E Ll Pj(^) , which is the desired exponential concentration inequality since ^ D p = K — 

It remains to prove Lemma [2 .21 Let g denote the inverse of the jacobian of T, and the 
inverse of the jacobian of T k . Let C denote the transfer operator associated to the map T, 
defined by duality by J u-voT d/j, = j Cu-vdfj,. It can be written as Cu(x) = Yl,Ty=x 9{v) u {y) ■ 
In the same way, C k u(x) = ^2j> k y=x 9^ k \v) u {v)- O ne can define a Markov chain by jumping 
from a point x to one of its preimages y with the probability g(y), then C is simply the 
Markov operator corresponding to this Markov chain. In particular, 

K p (x p ,Xp +1 , . . . ) = E(if|Jp)(xp,x p+ i, . . . ) = E(K(X , . . .,X p -i,x p , . . . )\X p = x p ) 
= 9 {p) (y)K(y,...,TP~ 1 y,x p ,...). 

TP(y)=x p 

To prove that D p is bounded, i.e., K p is close to K p+ \, one should show that this quantity 
does not depends too much on x p . The preimages of x p under T p equidistribute in the space, 
therefore one should be able to show that K p is close to an integral quantity. This is done 
in the following lemma. 

Lemma 2.3. We have 

r p ~ l 

Kp{x p ,...)- K(y,...,TP- 1 y,x p ,...)d f i(y) < Lip^)^ 1 "^ 
J j=0 
where C > and p < 1 only depend on (X,T). 

This lemma implies in particular that K p (x p , x p +\, . . . ) — K p (x' p , x p +x, . . . ) is bounded by 
C X/j=o Lipj(-^)p p_ -'- Averaging over the preimages x' p of x p+ i, we get the same bound for 
D p (x p , x p+ \, . . . ), proving Lemma I2T21 

Proof of Lemma \2.3l The equidistribution of the Markov chain starting from x p is formu- 
lated most conveniently in terms of the transfer operators, which act on functions of one 
variable. Therefore, we will eliminate the variables xq, . . . ,x p -i one after the other. Let us 
fix a point x* in X, we decompose K p as 

p-1 

K p (x p ,. . .) = J2 9 ip) {y){K{y,...,Ty,x*,...,x 1f ,x p ,...) 

i=0 TP(y)=x p 

K(jJi ■ ■ ■ , T y, x*, . . . , x*, Xp, . . . )) 
-\- K{x* , . . . , x* , Xp, . . . ) . 
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For fixed i, we may group together those points y G T p (x p ) that have the same image 
under T l , splitting the sum Y^Tv( y )=x p as Y^TP- i {z)=x v HT i {y)=z- Since the jacobian is mul- 
tiplicative, one has g^ p \y) = g^ l \y)g (j> ~ l \z). Let us define a function 

/»(*)= g {i) (y)( K (y,---, Ti y,x*,...,x*,x p ,...) 

T i y=z 

(2.1) - K(y, . . . , T l ~ l y, x*, . . . , x*, x p , . . . )) 

T*y=z 

Denoting by £ the transfer operator (which satisfies C k f(x) = Y2r k (z)=x 9^ k \ z )f( z ))i we 
obtain 

p-i 

Kp(xp, . . . ) = ^ ^ £^ fi(xp) + K(x*, . . . , x*, Xp, . . . ). 

The function is bounded by Lip^K), hence < CLip^if) (since ^2r t y=z 9^ (y) = 1 
by invariance of the measure). To estimate the Lipschitz norm of /j, we write 

fi(z) - fi(z') = J>»(y) - 9 {l Ky'))H(y, . . .,Ty) 

(2 ' 2) + £ 5 W (y')(^(2/, • • • , T l y) - H(y>, . . . , Ty')), 

where z and z' are two points in the same partition element, and their respective preimages y, 
y' are paired according to the cylinder of length i they belong to. A distortion control gives 
\g^\y) — 9^\y')\ < Cg^\y)d(z,z'), hence the first sum is bounded by CLip i (K)d(z, z'). 
For the second sum, substituting successively each T J y with T J y', we have 

i i 

\H(y, . . .,Ty) - H(y', . . .,Ty')\ <2^U Pj (K)d(T^y,T^) < 2 £ U Pj (K)^d(z, z'). 

3=0 3=0 

Summing over the different preimages of z, we deduce that the Lipschitz norm of fi is 
bounded by C£} =0 Li Pj (l<Q/^. 

Let C be the space of Lipschitz functions on X, with its canonical norm = sup |/| + 
Lip(/). The operator C has a spectral gap on C: there exist C > and p < 1 such 
that \\C k f -J/d/x|| c < Cp k ||/|| c . We get H^-^ - $ fAp\\ c < CfP^ £} =0 U Pj (K)^. 
This bound in C implies in particular a bound for the supremum. Increasing p if necessary, 
we can assume p > /3. Summing those bounds, one obtains 

p-i , 

Kp( x pi • • • ) ~ ^ ] / /i d/i if(x*, . . . , x*, Xp, . . . 

i=0 ^ 

p— 1 i p— 1 

i=0 j=0 j=0 

p-1 

<C'ELi Pj (^)(p'r J , 

3=0 

for any p' G (p, 1). 

Finally, when one computes the sum of the integrals of /j, there are again cancelations, 
leaving only / K(y, T p ^y, x p ,...) dp(y). □ 



OPTIMAL CONCENTRATION INEQUALITIES FOR DYNAMICAL SYSTEMS 



6 



2.2. Bilateral subshifts of finite type. We consider now Xz C £ the state space of 
a topologically mixing bilateral subshift of finite type, together with an invariant Gibbs 
measure For two points x = (. . . X-\XoXi . . . ) and y = (. . . y_\y§y\ • • • ) in X%, let sz 
be their bilateral separation time, i.e., inf{|n| : x n 7^ y n }, and define a distance dz(x,y) = 
psz(x,y) £ Qr some ^ < i. "We denote a function on X% by Kz(xo, . . . , x n _i), to emphasize 
the dependence both on the past and the future. 

Theorem 2.4. The system (X%,T,n%) satisfies an exponential concentration inequality. 

This is stronger than Theorem 12.11 which proves this statement for functions Kz(xq, . . . , 
x n -\) depending only on the future (xi)o° of each variable. We will deduce Theorem 12.41 
from this statement by an approximation argument, by sending everything far away in the 
future. 

Proof. Let us first assume that Xz is the full shift. We fix a function K%(xq, . . . , x n -i) 
depending both on the past and future of the variables. For N £ N, we define Kn(xo, . . . , 
x n+ N_i) = Kz(xn, . . . ,x n +AT_i). Thanks to the invariance of the measure, it is equivalent 
to prove concentration inequalities for Kz or Kn- 

Let us now define a function : X^ +N — > X^ +N depending only of the future of the 
variables, and let us write i£jv = Kn &n- Since this function only depends on the future, 
Theorem 1 2 . 1 1 applies to it. 

We set $Ar(xo, • • • ,x n +jv_i) = (yo, ■ ■ ■ ,y n +N-i), where the yi are defined inductively as 
follows. First, let us choose an arbitrary past (p)Z^ a , and let yo = ((p)-Ln (^o)^) 1 ^ only 
depends on the future of xq. If yo, . . . , y%-\ are already defined, we let y, = ((yj_i)° ^ (xi)o°)- 
In other words, 

(2-3) yi = ((p)I^, (x )o, 0i)o, • • • , Oi-i)o 5 (a?i)o°), 

with an origin laid on (xj)o- This defines the function <&jv 5 only depending on the future of 
the points. 

Let us study the Lipschitz constants of Kn = -fijv o $tv- If we fix Xj for j 7^ i and vary 
Xi, then we change y« for j > i, at its coordinate with index — (j — i). Therefore, 

u Vi {k N ) <Y J ^v 3 {K N )P j - i . 

With Cauchy-Schwarz inequality, we get ^ Lipj(-Rjv) 2 < C Lip^iC^) 2 = C Y Lipi(^z) 2 , 
for some constant C. Applying Theorem 12.11 to Kn and changing variables by x' = T N x, 
we obtain 

j e /^ ( T-^...,T-vy,...,T- V) dm{xl) 

< e /i^iv(T-^x^...,T-l a; ^x^...,T«- 1 x')d /^z (x') e CE^^ 1 Lip ^ (i^ z )^ 

By construction, the function Kn(T~ n x', . . . , T~ 1 x', x', . . . , T n ~ 1 x') converges to Kz(x', 
. . . , T n ~ 1 x') when N tends to infinity. Hence, the previous equation gives the desired 
exponential concentration. 

When Xz is not the full shift, there is an additional difficulty: one can not define $at as 
above, since a point defined in (|2.3p might use forbidden transitions. We should therefore 
modify the definition of <3?tv as follows. For any symbol a of the alphabet, we fix a legal 
past p(a) of a. We define ®n(x , ■ ■ ■ ,x N+n -i) = (y , . . . ,y N+n -i) by y = (p((x ) ), (xo)o°) 
(this point is admissible). Then, if the transition from (xi_i)o to (xj)o is permitted, we 
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let yi = co, (zi)o°)> and otherwise we let y { = (p((x»)o), (^»)o°)- Therefore, the 

points j/j only use permitted transitions. The rest of the argument goes through without 
modification. □ 



3. Uniform Young towers with exponential tails 

There are two different definitions of Young towers, given respectively in [You98j and 
|You99| . The difference is on the definition of the separation time: in the first definition, one 
considers that the dynamics is expanding at every iteration, while in the second definition 
one considers that the dynamics is expanding only when one returns to the basis of the 
tower. Therefore, there is less expansion with the second definition than with the first one, 
making it more difficult to handle. We will say that Young towers in the first sense are 
uniform, while Young towers in the second sense are non-uniform. In this section, we work 
with the (easier) first definition, which turns out to be the most interesting when dealing 
with exponential tails. Here is the formal definition of a uniform Young tower: it is a space 
A satisfying the following properties. 

(1) This space is partitioned into subsets A a ^ (for a E N and I E [0, (f)(ct) — 1], where (j) 
is an integer- valued return time function). The dynamics sends bijectively A a £ on 
A a>t+1 if t < 4>(a) - 1, and A a ^ (a) _ 1 on A := \J a A a , . 

(2) The distance is given by d(x,y) = f3 s ( x ' y ' where /3 < 1 and s(x,y) is the separation 
time for the whole dynamics, i.e., the first n such that T n x and T n y are not in the 
same element of the partition. 

(3) There is an invariant probability measure fi such that the inverse g of its jacobian 
satisfies \g(x)/g(y) — 1| < Cd(Tx,Ty) for any x and y in the same element of the 
partition. 

(4) We have gcd(</>(a) : a E N) = 1 (i.e., the tower is aperiodic). 

When the return time function (j) has exponential tails, i.e., there exists cq > with 
f^ o e c °^dfi < 00, we say that the tower has exponential tails. We will write h(x) = £ if 
x E A. a ,f- this is the height of the point in the tower. For x E A, we will also denote by ttx 
its projection in the basis, i.e., the unique point y E Ao such that T h ^ x \y) = x. 

Theorem 3.1. Let (A, T, jj) be a uniform Young tower with exponential tails. It satisfies 
an exponential concentration inequality: there exists C > such that, for any n E N, for 
any separately Lipschitz function K(xq, . . . , x n —\), 

( 3-1 ) J e K{x ' Tx '-' Tn ~ lx) d^(x) < e f K(x,...,T^x)d^x) e cz^o^MK)\ 

Let us first remark that, for any eo > 0, it is sufficient to prove the theorem for functions 
K such that Lipj(i^) < eo for all i. Assume indeed that this is the case, and let us prove 
the general case. Let K(xq, . . . , x n -\) be a separately Lipschitz function. Let us fix an 
arbitrary point 2* in A. To any (xq, . . . , x n -i), we associate (yo, • • • , Un-i) by yi = x% 
if Lipj(i^) < eo and yi = x* otherwise. The function K(xq, . . . ,x n -i) = K(yo, . . . , y n -i) 
satisfies Lipj(i^) < eo for all i. Moreover, 

\K-K\< Y,U Pi (K)l(Li Pi (K) > e ) < ]T Li Pl (K) 2 /e . 

i i 

Therefore, the inequality (|3.ip for K readily implies the same inequality for K, with a 
different constant C = C + 2/eo . 
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Let us now fix a suitable eo (the precise conditions will be given in the proof of Lemma l3.3p . 
and let us consider a function K with Lipj(i^) < eo for all i. To prove the exponential concen- 
tration inequality, we follow the strategy of Section [2j Let K p (x p , . . . ) = M(K\ J- p )(x p , ■ ■ ■), 
the first step is to prove an analogue of Lemma 12,31 Since the transfer operator has a 
spectral gap on a suitable space of functions, as shown by Young in |You98] . we can easily 
mimic the proof of this lemma. 



Lemma 3.2. For all x p S Aq, 

K p (x p , ...) - / K(y, T p ~ l y, x p ,...) dpt(y) 



p-i 

j=0 



where C > and p < 1 only depend on A. 



The main difference with the subshift case is that this bound is only valid for h(x p ) = 0. 
It is of course false if h(x p ) is large, since there is no averaging mechanism in this case. 

Proof. As in the proof of Lemma 12.31 we write 

p-i 

K p {x P i . . . ) — ^ ^ fi (%p) K{x* , . . . , x* , Xp, . . . ) , 

where the function /j is bounded by Lipj(ET), and the Lipschitz norm of fi on any partition 
element is at most C X]}=o -^ip^ (-?0 P % ~ 3 f° r some P < 1- 

Let C be the space of function on A such that < Ce th ^ and \f(x) — f(y)\ < 

Cd{x, y)e eh ^ for all x, y in the same partition element. Young proves in |You98] that, if 
e is small enough, then C has a spectral gap on C: there exist C > and p < 1 such that 
||£ fc /-//d/x|| c <CV||/|| c . 

We obtain - / fi dp\\ e < Cp^ 1 ^ =0 Lip^^Qp^'. This bound in C gives in 

particular a bound on the supremum for points at height 0, and in particular at the point 
x p . Summing those bounds over i, we get the desired result exactly as in the proof of 
Lemma 12.31 □ 



The next step of the proof is the following lemma. It is here that the Lipschitz constants 
hipj(K) should all be bounded by eo- As before, let K p = E(i^|J-"p), and D p = K p — K p+ \. 

Lemma 3.3. There exist cq > 0, C\ > and p < 1 such that any function K(x§, . . . , x n —\) 
with l-i\pj{K) < eo for all j satisfies, for any p, 



E(e D -| Jp +1 )(x p+1 , . . . ) < e Cl £?=o 

Proof. If the height of x p +i is positive, then this point has a unique preimage y, and 
D p (y,x p +i, . . . ) = 0. Therefore, K(e Dp \ J- p +\)(x p +i, . . . ) = 1 and the estimate is trivial. 

Assume now that h(x p +i) = 0. Let us denote by {z a } the preimages of under 
T (with z a £ A Q ,^( a) _ 1 ). Let A(z) = D p (z, x p+1 , . . . ), we have E(e D f | T p+ i)(x p+ i, . . . ) = 

Fix a point z = z a , with height h > 0. If h < p, consider the projection ttz of z in 
the basis of the tower. Since K p (z, . . . ) = K p _h(Trz, . . . , z, . . .), Lemma 13.21 shows that 
K p (z, . . . ) is equal to / K(y, T^y, nz, . . . ) d/i(y) up to C E^o _1 Lip^ET)^-^ 1 ^. 
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Up to an additional error Yl P j= p -h LiPj C^O j this is equal to f K(y, . . . , T p y, x p +i, . . . ) d//(y). 
Applying again Lemma 13.21 (but to the point x p +i), we obtain 

_ p 
\A(z)\ = \K p (z,x p+1 ,...)-K p+1 (x p+1 ,...)\<C Y, U Pj (K)ff- h -J + Y U Pj (K). 

j<p-h j=P~h 

This estimate is also trivially true if h > p (by convention, one sets Lip^iT) = for j < 0). 
In particular, since sup Lip^ (K) < eo, we always get |j4(z)| < Cq{H + l)eo for some Cq > 
(independent of the value of eo). Using the inequality (x\ + • • • + Xk) 2 < k^2, xf, we get 



\A( Z )\ 2 <cl y u Pj (K)pp- h -n +c(h+i) y U M K ) 2 

^3 2) \3<p-h ) j=p-h 

<C Y, U Vj {KffP~ h ^ + C{h + l) Y Lip.W 2 , 

j<p-h j=P~h 

where we used Cauchy-Schwarz inequality in the last inequality. 

The function A satisfies a neat bound on points z a with small height, but it is unbounded 
on points with large height. Therefore, Hoeffding-Azuma inequality does not apply (contrary 
to the subshift of finite type case) . While there are certainly exponential inequalities in the 
literature that can handle this situation, it is simpler to reprove everything since we are not 
interested in good constants. 

We have |e^ — 1 — < A 2 e^ A ^ for any real number A. Therefore, 



Yg{z a ){e A ^ -l-A(z a )) 



< Y9(za)A{z a ) 2 e 



A(z a )\ 



In the right hand side, g(z a ) < C^(A Qj o) by bounded distortion, and |A(z a )| < Coeo(l + 
4>(a)) as we explained above. Together with (|3.2p . we get 



Y9{z a )A(z a ) 2 e l 



A(z a )\ 



<CY^ = h)e c ^ h \ Y U Pj (K) 2 f?- h -i + (h + l) Y u P 3 ( K f 

h>0 \j<P~h j=p—h 

Since the tower has exponential tails, we have p((j) = h) < p^ for some po < 1. If eo is small 
enough, we get p{<j) = h)e c ° e ° h < p\ for some p\ < 1. Therefore, in the previous bound, the 
coefficient of Lip J -(A') 2 is at most 

Y p h ^~ h ~ J + E (* + ^ ^ (p - *) fir* + fc j , 

h<p-j h>p-j 

for some p2 < 1. This is bounded by C ' p v ~ 3 for some p < 1. Hence, we have proved that 



Y g ( Za ^ 



A(z a ) 



1-A(z a )) 



KCY^^iKf 

j<p 
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Since ^2g(z a ) = 1 and ^ g(z a )A(z a ) = 0, the left hand side if equal to |^ g{z a )e A<yZa ^ — l|. 
Finally, 

\E(e D *\T p+1 )(x p+1 ,...)\ = \Y,g(z a )e A{Za) \ < 1 + Lip^K) 2 

j<p 

This concludes the proof. □ 

Proof of Theorem \3.1\ Consider a function K with Lip^i^) < eo for all j. Using inductively 
Lemma 13, 31 we get for any P 

Since ^2p=Q D p converges to K — K(K) when P tends to infinity, we obtain K(e K ~ E ^) < 

e CE L iPj(^) _ This proves the exponential concentration inequality in this case. The general 
case follows, as we explained after the statement of the theorem. □ 

The exponential concentration inequalities for uniform Young towers with exponential 
tails easily extends to invertible situations, as follows. Consider T% : Ag — > A% the natural 
extension of such a Young tower, with bilateral separation time s%, and distance dz(x,y) = 
some /3 < 1. 

Theorem 3.4. The transformation T% satisfies an exponential concentration inequality. 

The proof is exactly the same as the proof of Theorem 12. 4| exploiting the result for the 
non-invertible transformation. 

4. Non-uniform Young towers with polynomial tails 

In this section, we consider Young towers in the sense of [You99j, i.e., non-uniform Young 
towers. The combinatorial definition is the same as in Section [3l the difference is on the 
definition of the separation time (and therefore of the distance) as follows. Let Ao be the 
basis of the tower, let To : Ao — > Aq be the induced map on Ao (i.e., Tq(x) = T^ x \x) 
where <f)(x) is the return time of x to Ao). For x,y € Ao, let s(x,y) be the smallest integer 
n such that Tq(x) and T^y) are not in the same partition element. This separation time 
is extended to A as follows. For x,y G A, let s(x,y) = s(irx,TTy) if x and y are in the 
same partition element, and s(x,y) = otherwise. In other words, s(x,y) is the number of 
returns to the basis before the trajectories of x and y separate. Finally, the new distance is 
d(x,y) = /3 s(x,y ) for some /3 < 1. 

Intuitively, we are now considering maps that are expanding only when one returns to 
the basis, and can be isometries between successive returns, while the maps of Section [3] are 
always expanding. The setting is not uniformly expanding any more, rather non-uniformly 
expanding. For instance, intermittent maps can be modeled using non-uniform Young 
towers. 

If the tails are not exponential any more, one can not hope to get exponential concen- 
tration inequalities. If the tails have a moment of order q > 2, then the moments of order 
2q — 2 of Birkhoff sums are controlled, and this is optimal [MN08, Theorem 3.1]. Our goal in 
this section is to generalize this result to a concentration inequality (with the same optimal 
moment). 
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Theorem 4.1. Let T : A — > A be a non-uniform Young tower. Assume that, for some 
q > 2, ^2 (p(a) q ^(A ai Q) < oo. Then T satisfies a polynomial concentration inequality with 
moment 2q — 2, i.e., there exists a constant C > such that, for any n £ N ; for any 
separately Lipschitz function K(xq, . . . , x n -i), 



J 



. I- 1 

2q-2 

2 



T n -V)- J K(y,...,T n - 1 y)d f x(y) d/z(s) < C ^Lip^if) 



The proof is considerably more difficult than the arguments in the previous section (and 
also than the arguments of |MN08| since the main inequality these arguments rely on, due 
to Rio, is of no help in our situation). The general strategy is the same as in the previous 
sections: decompose K — ¥,(K) as ^ D p where D p is a martingale difference sequence, obtain 
good estimates on D p , and then apply a martingale inequality (in our case, the Rosenthal- 
Burkholder inequality) to obtain a bound on K — ~E(K). The difficulty comes from the 
non- uniform expansion of the map: instead of a uniformly decaying geometric series as in 
the previous sections, our estimates will be non-uniform, quantified by the number of visits 
to the basis in a definite amount of time. 

The rest of this section is devoted to the proof of Theorem 14.11 In particular, we will 
always assume that A is a non-uniform Young tower satisfying ^ <fi{a) q ii{A a $) < oo for 
some q > 2. 

Remark 4.2. The arguments below also give an exponential concentration inequality in 
non- uniform Young towers with exponential tails, thereby strengthening Theorem [37TJ Since 
most interesting Young towers with exponential tails are uniform, we will not give further 
details in this direction. 

4.1. Notations. As usual, the letter C denotes a constant that may change from one oc- 
currence to the next. Let us also introduce a similar notation for sequences. For Q > 0, we 
will write for a sequence of nonnegative numbers such that ^ rfion < oo, and we will 
allow this sequence to change from one line to the other (or even on the same line). We will 

also write dn for a generic nondecreasing sequence with Yl n Q d^ ] < oo. 

If u n and v n are sequences, their convolution u * v is given by {u-kv) n = Y^k=o u k v n-k- 
One easily checks that, for Q,Q' > 0, 

(4.1) ( C W)* C W')) n <4 mln ^')). 

Following the above convention, this statement should be understood as follows: if two 
sequences u and v satisfy, respectively, Yl r ' 1 u n < oo and Y^ n ® v n < °°> then w = u*v 
satisfies YJn mm W'^ ^w n < oo. Indeed, letting Q" = mm(Q,Q'), 



£ rfi"w n = J> + if"u k v e < J> + if (i + lf'u k v e 

k,i k,l 

< (J2(k + i) Q u k ) ■ (V(£ + lf've) < oo. 



We also have for Q > 1 

oo 

(4.2) £4 Q) <<4 Q - 1} - 



k=n 
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Indeed, 

oo / k \ 

E n< ^ 1 E 4 0) = E E ^ 4 Q) < E CkQ ^ ] < °°> 

fc=n A; \n=0 / fc 

and the sequence X^fcLn c fc^ ^ s nonincreasing. 

4.2. Renewal sequences of operators, estimates on the returns to the basis. An 

important tool for our study will be renewal sequences of operators, as developed by Sarig 
and Gouezel |Sar02[ lGou04bl lGou04c| . that we will now quickly describe. 

Consider a function /, we wish to understand C n f(x) = Y^T n y=x & ' (v) f (v) f° r x e ^0- 
For a preimage y of x under T n , we can consider its first entrance into Aq, and then its 
successive returns to Ao- We obtain a decomposition 

(4.3) lA £ n = E T ^ 

k+b=n 

where T/% takes the successive returns to Ao (during time k) into account, and deals with 
the part of the trajectory outside Ao- Formally, for x € Ao, Tkf(x) = ^ g( k ' (y) f (y) where 
the sum is restricted to those y such that T k y = x and y £ Ao- The operator B^, in turn, is 
given on Ao by B^f(x) = ^ 9 <yb \y)f(y) where the sum is restricted to those y with T b y = x 
and ^....T^^Ao. 

The operators Bf, are essentially trivial to understand, their behavior being controlled 
by the tails of the return time function <j>. On the other hand, the operators embody 
most of the dynamics of the transformation. To understand them, we introduce yet another 
operator Rj considering only the first return to Ao at time j, i.e., Rjf(x) = ^2g^'(y)f(y) 
where the sum is restricted to those y such that T^y = x and y 6 Ao, Ty, . . . , T^ 1 y Ao- 
Splitting a trajectory into its successive excursions outside of Ao, one obtains 

Tk = E E R h'" R h- 

£>1 h+-+u =k 

Formally, this equation can be written as 

(4.4) = (I -J^**)" 1 . 

In fact, the series defined in this equation are holomorphic for \z\ < 1 (as operators acting 
on the space C of Lipschitz functions on Ao) and this equality is a true equality between 
holomorphic functions. Moreover, the spectral radius of ^2Rjz J is at most 1 for \z\ < 1. 

A powerful way to use the previous equality is Banach algebra techniques. Simple ex- 
amples of Banach algebras are given by Banach spaces B of sequences c n such that, if 
(cn)neN G B and (c' n ) ne ^ £ B, then their convolution c*c' still belongs to B. For instance, 
this is the case of sequences with a moment of order Q > (by (|4.ip ). or of sequences 
satisfying c n = 0(1 /rfi) for some Q > 1. Given such a Banach algebra of sequences B, one 
can consider the Banach algebra B of sequences of operators (M n ) ng N (acting on some fixed 
Banach space C) such that the sequence (||M n ||) n gN belongs to B. One easily checks that B 
is again a Banach algebra (for the convolution product). 

When the Banach algebra of sequences B satisfies a technical condition (its characters 
should all be given by evaluation of the power series c nZ n at a point z of the unit disk), 
which is satisfied in all examples we mentioned above, then one can use the Wiener lemma to 
obtain the following property: if a sequence of operators (M n ) ng N belongs to B and ^ M n z n 
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is invertible as an operator on C for any z in the closed unit disk, then (M n ) ng ^ is invertible 
in B. In particular, the power series ^ M' n z n = (][] M n 2:™) _1 satisfies (||iW^||) n6 H G B. 

Using Banach algebra arguments and the renewal equation f|4.4j) . the following proposition 
is proved in [Gou04c, Proposition 2.2.19]. 

Proposition 4.3. Consider a Banach algebra of sequences B satisfying several technical 
conditions. If the sequence (%2k>nl jL ( c l ) = ^))neN belongs to B, then this is also the case of 
the sequence (||T n +i — T n ||)„ g N- Moreover, T n converges to II : / i— > (J Ao /)1a - 

The technical conditions on the Banach algebra (all the characters of B should be given by 
the evaluation at a point of the closed unit disk, and the symmetrized version of B should 
contain the Fourier coefficients of partitions of unity of the circle) will not be important 
for us, let us only mention that they are satisfied for the Banach algebras of series with 
moments of order Q > 0. 

The contraction properties of the dynamics T are dictated by the number of returns to 
the basis. Their asymptotics are estimated in the next lemma. 

Lemma 4.4. For x G A, let ip n (x) = Card{0 < k < n — 1 : T k x G Ao} be the number of 
visits to the basis of x before time n, and let ^ n (x) = p^ n ( x \ where p < 1. If the return 
time on Aq has a moment of order q > 1 (i.e., fj,(<j) = n) < c$ ), we have 



L 



^! n Mx)<C^- l \ 

This bound is optimal: on ^A a Ma)-n (for a with (j)(a) > n), we have = 1. Therefore, 
the integral in the lemma is bounded from below by A*(U^(a)>n ^«,o) ~ J2^+i c i ~ c ™ ■ 

Proof. Let us define an operator U n by the series ^2U n z n = Y1T=o(p S RnZ n ) k = (J — 
pY^ RnZ n )~ l • Then U n f(x) = Yl g^> (y)^ n (y) f (y) , where the sum is restricted to those 
y G Ao with T n y = x. Integrating and changing variables, we obtain 



U n l(x)dp(x)= * n (y) d/i(y). 

A JA nT-«(A ) 

Since the spectral radius of ^ R n z n is at most 1 for \z\ < 1, it follows that I — pYl RnZ n 
is invertible on C (since p < 1). Moreover, the sequence \\R n \\ satisfies ||i? n || < Cp(4> = 
n) < Cn \ It follows from Wiener's Lemma that Y U n z n = (I — pYL -^n- 2 ™) -1 belongs to the 
same Banach algebra of operators, i.e., \\U n \\ < c$ . We obtain 



/ * n (y)dp(y)<c^. 
JA n nT- n An 



' A nT- n A 

To study the integral of *$> n on T~ n Ao, denote by A b the set of points in A that enter Ao 
exactly at time b. On A b , we have ^ n (y) = ^>n-b{T b y). A distortion control gives 



/ * n < Cp{A b ) [ * n _ fe < Cp(A b )c, 

JA b nT- n A J A nT-( n - b )A 



(g) 

n—b' 



Moreover, for b > 0, A b = LLa)>6 A a ,4>( a )-6> hence p(A b ) < J2i>b c t ^ c b • We obtain 



/ * n {y)Mv) = E / *n{y)My) < cE4 9_1) ^ 6 - 

JT-"A b=Q JA b nT- n A b=Q 

By gT|), this is bounded by c£~ l) ■ □ 
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4.3. Bounding D p . To follow the same strategy as in the previous sections, we need to 
show that K p is close to an integral, as in Lemma [2 .31 To do so, as in the proof of this lemma, 
we define a function fi as in (|2.ip . and control its iterates under the transfer operator. The 
first step is to control its Lipschitz constant. 

Lemma 4.5. For z and z' with zero height, \fi(z)\ < CLip^-fT) and 

i 

\fi{z) - h{z')\ < Cd{zJ)^U Vj {K)<gp. 

j=0 

Proof. The inequality |/j(^)| < CLipj(iT) is trivial. To control the Lipschitz constant, as 
in (|2.2|) . we decompose 

fi(z) - fi(z') = J>«(y) - 9 {t) (y'))H(y, . . .,Ty) 

+ g {l) (y')(H(y, T l y) - H( y >, . . . , r y ')). 

Using distortion controls, we bound the first sum by CLip i (K)d(z, z'). For the second sum, 
we replace successively each T 3 y with T J y' , writing it as 

i 

£ Yl 9 {i) iv') (H (y, . . . , T*- V I*y, T^+y , . . . , Ty>) 

Tiy'= z ' j=0 

- H(y, T j ~ l y, T j y', T j+1 y', Ty')). 

Since the distance between T 3 y and T 3 y' is bounded by tyi_j(T J y')d(z, z'), we obtain a 
bound 

i 

£ J2^(y')^j( Tj y') u M K ) d ^ z ') 

T % y'=z' j=0 

i 

< d(z, z') E £ 9 {i - j) {y'^i-M) Li P; (*) 

<Cd(z,z') i^UpjiK) [ 

by bounded distortion. With Lemma |4,4[ this gives the result. □ 

To follow the strategy of proof of Lemma 12.31 we need to understand the iterates of /, 
under the transfer operator. This is done in the next lemma. 

Lemma 4.6. For any r > and any z G Ao, we have 

j=0 \k=0 ) 

Proof. We will use the decomposition 1a £ t = Ylk+b-r^kBb given by (j4.3j) to understand 
C'J). 

Let us first describe the asymptotics of T^. Let C denote the space of Lipschitz functions 
on the basis Aq of the tower. We define an operator II on C by 11/ = (J. /)1a - The 
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operators T n converge to II. Since \\T n — T n +i|| < Cn ^ by Proposition 14.31 we have 



E 



.(9-1) < >- 2 ) 



(4-5) \\T k -U\\<}2\\T n -T n+1 \\< 

n=k n=k 

by 

We will now estimate using Lemma 14.51 For z 6 Ao, we have 

B b fi(z)= Y, 9 {b) (*«)M*«), 

4>{a)>b 

where z a is the unique preimage of z under T b in ^ a Mtx)—b- We have 
(4.6) iBJiU < l/iloo • C E M( A «>o) < C|/i|oc4 9-1) < CU Vi {K)c 

4>{a)>b 

Let us now estimate Bbfi(z) — Bbfi(z') for z and z' in the same partition element. If we form 

the difference g( b \z a ) — g^(z' a ), the resulting term is bounded by Cd(z, z') Lip i (K)cP ^ 
(using distortion controls and the same computation as in ()4.6p ). On the other hand, denot- 
ing by h a = (ft (a) — b the height of z a , we have 



(9-1) 
b 



i—ha 



\fi(*a) ~ fi(z' a )\ < C E LiP, ( K K-~j-h a + E Li M K ) d ( z > z ')- 

This follows from Lemma 14.51 applied to the function fa—ha an d the points 7rz a and Trz' a . 
Summing over a, we obtain a bound for the Lipschitz constant of B^fa of the form 



E 9 {b) ^ 

•>{a)>b 



i—h„ 



E U ^ K ) 



3=0 



j=i—h a +l 



By bounded distortion, </ fa )(z a ) < C//(A ai o). Taking the union over a and writing 
we get that the coefficient of Lip^ (K) in this sum is bounded by 

b+i-j 



(a 



c E m= £ ) c l7Vb) +c E ^ = l )- 



t=b l=b+i-j+l 

The second term is bounded by cf_^ b by (|4.2p . while the first term is bounded by 

i-j+b 



E 



„(?)„(?-!) < _(ff-l) 
L £ L i-j+b-^ — c i-j+b 



by (|4.ip . We have shown that 



\B b m c <Y^(K)^- l; 



-i-j+b- 



3=0 



(The contribution of (|4,6p is compatible with this bound 
Lc 

(4.7) 



Let us now study C r fa on Aq. We write = U + E^ with ||£/fc|| — c jt ^> ^ (|4 . 5 j) . Hence, 



£' fa = E TkBbfi = E ^-Bfe/i + E EkBbfi 

k+b=r k+b=r k+b=r 
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The first term is a constant function equal to Ylb=o Ja ^bfi- Denoting by A& the set of 
points that enter Aq exactly at time b, we have J^ o Bbfi = J Ab fi- As a consequence 





fi=- 1 







[ fi< l/iloo^MAfc) < U Pi (K)J2^ 1] < LiPiWc 

•^Ub>r A & b>r b>r 



(9-2) 
r 1 



by (|4.2p . This bound is compatible with the statement of the lemma. The second term 
of (|4.7p is bounded (in C norm, thus in sup norm) by 



.(9-2) 



j=0 k+b=r 



C k C i-j+b' 



E ■ 

k+b=r 

This proves the lemma. □ 
We can now obtain the following lemma, which is the analogue in our setting of Lemma [2.31 
Lemma 4.7. For all x p £ Aq, 

p-l 



3=0 



K p (x p ,...)- J K(y,...,TP- 1 y,x p ,...)d^y) <£Li Pi (#)<£ 
Proof. Just like in the proof of Lemma 12.31 

k p (x p , ■■■)- J K (y> • • • > TP ~V x p> ■ 



(9-2) 
3 ' 



p-1 
i=0 



£ fi fef 



fi 



By Lemma 14.61 this quantity is bounded by 



p— 1 i 



' p—i 



i=0 j=0 Vfe=0 
The coefficient of Lip ■ (K) in this sum is 



P-3 P-3 



s (9-2) 

L fc "-p-k-j - u p~j 



k=0 



k=0 



by (|4.ip . This proves the lemma. 

The previous lemma makes it possible to control the moments of D p = K p — K p+ i\ 
Lemma 4.8. For all k < 2q, 



□ 



k/2 



E(\D P n T p+1 )( Xp+1 , ...)<cj2 upwg? +cy, c ( r K/2) 

j=0 h>0 



E U M K ) 2 

\j=p-h+l 



Proof. We follow closely the strategy of the proof of Lemma 13.31 If the height of x p+ \ 
is positive, the estimate is trivial. Otherwise, let {z a } denote the preimages of x p+ \ 
under T, with respective height h a = <j>(ct) — 1. Let A(z) = D p (z, x p+ \, . . .), we have 
E(\D p \*\T p+1 )(x p+1 , ...) = J2g(z a )\A(z a T. 
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Fix a point z = z a with height h > 0. If h < p, consider the projection ttz of z in the 
basis of the tower. Using Lemma 14.71 (at time p — h for the point ttz, and at time p + 1 for 
the point Xp+i), we get 

(4-8) \A(z)\< E Li Pi (^)c^.+ £ Li Pj (iO. 

j<p-h j=p-h+l 

This estimate also holds (trivially) if h > p. 

To estimate |t4(z)| k , we first use the inequality (x + y) K < Cx K + Cy K to separate the two 
sums. Then, in the first sum, since c p q _^_- is summable, we may use Holder inequality to 

g et (Ej< P -h U Pj(. K ) c p~h-j) < C T,j< P -h U Pj( K ) K Jp-h-j- For the second sum, we write 

\ 2 

Y7j=p-h+i u Pj( K )) < h EJ=p-/H-l Li Pj( K ) » and we obtain 

L4( Z )|*< E Lip^rc^. + CW 2 ! £ Li Pj (K)^ 

j<p-h \j=p-h+l 

Summing over a, we get that g(z a )\ A(z a )\ K is at most 

cj:^ = h)\ E Li Pi(^ )" c K- j + ^ /2 ( E Li p,( K ) 2 * 

h=0 \j<p-h \j=p— h+1 

In the first sum, the coefficient of J-iipj(K) K is at most 

(g) (9-2) <_(9-2) 
C ^ C p-h-j - c p-j 

h=0 

by (|4.ip . In the second sum, fj,(<fi = h)h K l 2 < c£ K / 2 ^ yielding the statement of the 
lemma. □ 



4.4. Proof of Theorem 14. 1L We will use the following Rosenthal-Burkholder martingale 
inequality |Bur731 Theorem 21.1 and Inequality (21.5)]. Let T v be a decreasing sequence 
of cr-algebras, and let D p be a sequence of reverse martingale difference with respect to J- p 
(i.e., Dp is J-" p -measurable and E(Z) p |J r p+ i) = 0). For all Q > 2, 



lE^I 



Q 
LQ 



< CE 



+cY,n\D P \ Q ). 



We apply this inequality to T v the a-algebra of sets depending only on x p ,x p +\, . . . , 
Dp = Kp — K v+ i and to Q = 2q — 2. By Lemma 14.81 with k = 2, we have 



to 



(4.9) E(^|^ +1 )(x p+1 ,...)<CELiPi(^) 2 ^T+^E^ E Li PiW 2 - 

j'=0 h>0 j=p— h+1 
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The coefficient of Lip^ (K ) 2 in this estimate is bounded by c^_p + Y?,h>p-j+i ^ — ^y-T ■ 
Hence, the first term in Rosenthal-Burkholder inequality is bounded by 

C fa f U Pj (K)^f] < C (^Lip^) 2 

\ P 3=0 j \ j 

For the second term, we should bound ^ E,(\D p \ 2q ~ 2 ). We sum the estimates of Lemma [4.81 

(with k = 2q — 2), to get 

(4.10) 

P j p>j h>0 P \j=p-h+l 

In the first sum, the coefficient of Lipj(K) 2q ~ 2 is X^fc c i ? ^ — C> therefore this sum is 
bounded by C £\ Up^Kfi" 2 < C (£ Up j (K) 2 ) q ~ 1 . 

The second sum is more delicate. Let us fix h and po £ [0, h), and let us consider the 
contribution of those p in po + Zh. The intervals [p — h + l,p] are disjoint. The inequality 
I>r* < yields 

E ( E L W 2 ) < f E E Li p,« 2 ) * (j2^r 

p=Po[h] \j=p-h+l ) \P=Po [h]j=p-h+l J \ j 

Summing over the h possible value of po, we get that the second sum of (|4.10p is bounded 
by 



C E c h h E Li PiW 2 ^ c E Li p^) 

h>0 V 3 ) \ 3 



since ^ ^cj^ < oo by definition. 

We have proved that ||£ L> p || < C (£i Lip^ (if ) 2 ) 9 . Since £ £> p = K - E{K), 
this proves Theorem 14.11 □ 

5. INVERTIBLE NON-UNIFORM YOUNG TOWERS 

Let T : X — > X be a non-uniform Young tower, with invariant measure /i. Its natural 
extension T% : X% — > X% preserves a probability measure There is a natural distance on 
Xz, defined as follows. First, the positive separation time s(x, y) is defined as for T. One can 
also define a negative separation time s-(x,y) in the same way, but towards the past: one 
iterates towards the past until the points are in different elements of the Markov partition, 
and one counts the number of visits to Ao in between. The distance d% is then defined by 
dz(x,y) = / # min ( s ( a; >3/)> s -( a; >3/)) < Geometrically, this distance is interpreted as follows: when 
one returns to the basis, there is uniform contraction along stable manifolds (corresponding 
to the past), and uniform expansion along unstable manifolds. Two points are close in the 
unstable direction if they remain close in the future for a long time (distance j3 s ^- x > y '), while 
they are close in the stable direction if they have a long common past (distance /3 s -( x > y >). 

Theorem 5.1. Let (T%,X%, be the natural extension of a Young tower in which the 
return time function (j) has a moment of order q. This system satisfies a concentration 
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inequality with moment 2q — 2, i.e., there exists a constant C > such that, for any n G N ; 
for any function Ki(xq, . . . , x n -i) which is separately Lipschitz for the distance d%, 



2q-2 

K Il {x 1 ...,T n - 1 x)- / K z (y,...,T n - 1 y)dfi z (y) 



This implies Theorem 14.11 (if one considers a function ~K% depending only on the future 
of the points), but the converse is not true: since the contraction is not uniform, we are not 
able to reduce this theorem to Theorem 14, 1\ contrary to what we have done for subshifts of 
finite type or uniform Young towers. 

For the proof, we will work with the non-invertible system X, or rather with an augmented 
space X* = X U {x*} where x* is a new point (at distance 1 of any point of X, with zero 
measure) . 

Let us start with a function K% on X%, depending on the past and the future of points. 
We define a new function K on X™ as follows. We let K(xq, . . . , x n -i) = Kz(yo, ■ ■ ■ , y n ~i) 
where the yi are defined inductively. For each element a of the partition, let us fix an 
admissible past p(a). Let us also fix a point G X%. Let yo = (p(( x o)o), xo) (unless 
xq = x*, in which case let yo = y*)- If y%-\ is defined, let us define y^. If Xj = x*, we 
take yi = y*. If the transition from (xj_i)o to (xj)o is not permitted, let yi = (p((xj)o), x»). 
Otherwise, let y { = ((y^i^^^i). 

We claim that this function K satisfies an inequality 
(5.1) 



J K(x, T"-M - J K(y, T n ~ l y) dfi(y) 



2q~2 /n-1 

d/i(x) <C ^Lip,(K z 
Vi=o 




This implies Theorem l5.1l bv using the same argument as in Subsection 12. 2\ let K^iyo, • • • , 
y n+ Ar_i) = Kz(yN, • • • 1 2/JV+n-i)) an( I let be the function obtained from Kjy by applying 
the above procedure. After a change of variables, we get from (|5.ip 

K N (T~ N x, ...,x,Tx,..., r n_1 x) - E(K N ) q ~ dfi z (x) < C 

When N tends to oo, K^(T~ N x, . . . , x, Tx, . . . , T n ~ l x) converges to K%(x, . . . , T N ~ 1 x). 
Hence, we obtain the desired concentration inequality by letting N tend to infinity in the 
previous equation. 

To prove (|5.1|) . we follow the same strategy as in the previous section. Note that we can 
not directly apply Theorem 14.11 since the Lipschitz constants of K are not easily bounded in 
terms of those of K%, due to the non- uniform expansion. Therefore, we have to reimplement 
the strategy from scratch. 

Let us first start with a crucial remark. When one controls the Lipschitz constants of K 
in terms of those of K%, a point x* blocks the propagation of modifications, in the following 
sense: consider a difference K(xo, . . . , x n _i) — K(x' , . . . , x' n _i) where Xj and x^ coincide at 
all indices but j. By construction of K, this is equal to Kz(yo, ■ ■ ■ , y-n-i) — K^y^, . . . , y' n _i) 
for some points yi,y[ G X%. The definition shows that yi = y[ for i < j. On the other 
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hand, yi and y[ might be different for all % > j, not only for i = j. However, if there is an 
index k > j such that = x' k = x*, then ?/, = y\ for i > k: this follows directly from the 
construction. Therefore, K(xq, . . . , x n -\) — K(x'q, . . . , x' n _-y) will be estimated only in terms 
of Lipj(i£z) for j <i < k. 

To follow the same strategy as in the previous sections, we need to show that K p is close 
to an integral, as in Lemma 12,31 To do so, as in the proof of this lemma, we define a 
function fi as in (|2.ip . and control its iterates under the transfer operator. We decompose 
K p (x p , ...) = Y%=o £ P ~ l fi( x p) + K(x*, ■■■ ,x*,x p ,...), where 

h{ z ) = ^2 9 {l) (y)(K(y,...,T t y,x„,...,x^,x p ,...) 

T i y=z 

. . . , T y, x^, . . . , x*, Xp, • • • )) 

When % < p — 1, there is a point x* in the definition of fi, blocking the propagation of 
modifications as we explained above. Therefore, we may follow the proofs of Lemmas | 
and 14.61 in this setting, to obtain the following: 

Lemma 5.2. If i < p — 1, we have for any r > and any z £ Aq 



fiW- f h <^Li Pi (if z )(^4 9 - 2) c ; 

JA f=0 Vfe=0 



( 9 -2)>-l) 

i—j+r—k 



On the other hand, there is no such blocking effect for f p -%, yielding a worse estimate. 
Indeed, in f p -i, one considers averages of terms of the form K(y, . . . ,T p ~ 1 y,x p , . . .) — 
K(y, . . . , T p ~ 2 y, x*,x p , . . .). Considering the definition of K in terms of Kz, this difference 
reads K z (y' Q , ... , y' n _ x ) - K z {y'o, ■■■ , y'n-i) where the points y'-, y" belong to Xz, coincide 
for j < p — 1 and may differ for j > p — 1. For j > p — 1, the points y'- and y'' have the same 
future, and the same past up to the index j — p. Therefore, dz(y'j,y'j) < (3 Ca - rd ^ ke ^P'^ '■ x k^o} m 
Averaging over the points y with T p ~ l (y) = z, we get 

n-l 

\f P -i(z)\< Li Pj (K z )/3 Card ^ e ^] : ^ eA o>. 
j=p-i 

The functions Cf p —i and Cf p -\ — J f p -i also satisfy the same bound. 

Still following the strategy of proof of Section we deduce from those estimates an 
analogue of Lemma |4.7| with an additional error term coming from for all x p £ Aq, 



Kp(x p , ■■■)- J K (V> ■■■,T P 1 y,x p ,...) dfx(y) 



p—l n—l 
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In turn, this yields an analogue of Lemma 14.81 still with an additional error term: for all 
k < 2q, and for all x p+ \ G Aq 



E(\D p \ K \T p+1 )(x p+l ,...)<ci E Lip,(^)/3 Card{A;eb+1 ' i]: ^ eAo} 

\j>p+i 

(5 2) P ( p \ 

+cE L ^(^) Kc fc 2)+c E c ? W2) E L %(^ 

On the other hand, E(|Z?p| K |J r p + i)(x p+ i, . . . ) = if h{x p+ \) > 0. 

We can now conclude the proof of (|5.ip . following the strategy we used to prove Theo- 
rem 14.11 in Subsection 14.41 By Rosenthal-Burkholder inequality, we have 



2q-2 



E\K -~EK\ 2q ~ 2 = E E D p - OT 



^E( J D2|J p+1 ; 



+ C^E(| J D P | 2? - 2 ). 



The conditional expectations are estimated thanks to (|5.2p . The terms that were already 
present in the proof of Theorem 14. II are handled exactly in the same way. Therefore, we only 
need to deal with the additional term. Let us define a function <3?j (x) = ^ Car d{fe6[lJ] : T (z)eA } 
for x G Ao , and &j (x) = elsewhere (it is closely related to the function ^/j of Lemma 14.41 
with the difference that it is supported in Ao). The additional term in the Rosenthal- 
Burkholder inequality is bounded by 



C 



E ^(Kz)^- P -i(T p+1 x) 



9-1 



dfj,(x) 



+ C E / f E Li Pj (^z)^-p-i(^ +1 x) 

P>0 \3>P+1 



2q-2 



d/i(x). 



The inequality ]T] xf 1 < Xj) 9 1 shows that the second term is bounded by the first one. 
Therefore, to conclude the proof, it is sufficient to prove the following inequality: 

2-i 9-1 



(5.3) 



^(Kz)^- P -i(T p+1 x) 

p>o \i>p+i 



df,(x)<c(yu P AK z ) 2 



9-1 



This estimate is formulated solely in terms of the non-invertible system. Its proof is 
technical and complicated. Therefore, we defer it to Theorem lA.il in Appendix |A"1 Modulo 
this result, this concludes the proof of (|5.ip . and of Theorem 15.11 

6. Weak polynomial concentration inequalities 

The results of Section H] are not completely satisfactory for the significant example of 
intermittent maps. Indeed, for Pomeau-Manneville maps of index a G (0, 1) (with T(x) = 
x + cx 1+a (l + o(l)) for small x, see (|7,4I) below), the return time function to the rightmost 
interval satisfies a bound //{</> = n} ~ C /n l l a+l . Therefore, the corresponding Young tower 
has a moment of order q for any q < 1/a (which yields a concentration inequality of order 
Q for any Q < 2/ a — 2 when a < 1/2), but it does not have a moment of order 1/a. Indeed, 
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it only has a weak moment of order 1/a, meaning that n{<j> > t) < Ct~ l / a . An optimal 
concentration statement for such a map would therefore be formulated in terms of weak 
moments. This is our goal in this section. 

Theorem 6.1. Let T : A — > A be a non-uniform Young tower. Assume that, for some 
q > 2, the return time (ft to the basis of the tower has a weak moment of order q, i.e., 
there exists a constant C > such that /x{x 6 Ao : 4>(x) > t} < Ct~ q for all t > 0. 
Then T satisfies a weak polynomial concentration inequality with moment 2q — 2, i.e., there 
exists a constant C > such that, for any n 6 N, for any separately Lipschitz function 
K(xq, . . . , x n ^i), and any t > 0, 



|x : K{x,...,T n ~ l x)- f K{y,...,T n - l y)An{y) > tj 



K — E(/r)||^2 9 -2, m ^ C ( Lip (i^) 2 J , in close analogy with the statement of Theo- 



Let us introduce a convenient notation. When Z is a real-valued random variable and 
Q > 1, we write ||Z|| iQ ,™ = swptP(\Z\ > t) x / Q , so that F(\Z\ > t) < t~ Q ||Z||J QiW . This is 
the weak (semi)norm of Z. With this notation, the statement of the theorem becomes 

I £,2,-2,™ ^ ^ \ L,j>-">Vj' 1 ' 

rem 14.11 Note that ||Z||rQ, w is not a true norm: the triangle inequality fails, and is replaced 
by \\Z + Z'\\ LQtW < C(\\Z\\ lQ , w 

+ \\Z'\\ LQ.-w). On the other hand, 
||max(|Z|, \Z'\)\\% tW < \\Z\\% tW + \\Z'\\% <W . 

Since a sequence with a weak moment of order q > 2 has a strong moment of order 2, 
we may use intermediate results of the proof of Theorem 14.11 (and especially Lemma l4.7p to 
prove Theorem 16.11 The proofs diverge at the level of Lemma I4.8E the version we will need 
in the weak moments case is the following. 

Lemma 6.2. Assume that (f) has a weak moment of order q > 2. For all t > 0, 

p 



¥(\D p \ > t\F p+l ){x p+l ,. ..)< Ct-^-VTUpJK) 2 i- 2 cf} 



3=0 




Proof. If h(x p+ i) > 0, then x p+ \ has a unique preimage x p , and D p (x p , x p+ \, . . .) = 0. 
Therefore, there is nothing to prove. Assume now that h(x p +i) = 0, and let {z a } denote 
the preimages of under T. Writing A(z) = D p (z, x p +±, . . . ), we have 

F(\D p \>t\T p+1 ){x p+1 ,...)= 9(z*)- 

\A(z a )\>t 

Since <f> has a weak moment of order q > 2, it has a strong moment of order 2. Therefore, 
(Ob gives 

\A{z)\ < J2 u Pj{ K ) c P %-j + E ^ j (K)=:A 1 (z)+A 2 (z). 

j<p-h j=p-h+l 
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If \A(z)\ > t, then A l (z) > t/2 or A 2 (z) > t/2. Therefore, F(\D P \ > t\T p+1 ) is bounded by 
(6-1) 9(z a )+ E 9{z a ). 

A 1 (z a )>t/2 A 2 (z a )>t/2 

For the first sum, 

E ^«)<^fi(4)(Ai(4)/t) 2? - 2 

A 1 (z a )>t/2 



2q-2 



h>0 



<c*-e«- a >X>(* = *) E L^W 29 " 2 ^- 

h>0 j<p-h 

The coefficient of Lip j(K) 2q ~ 2 in this expression is X^=o ^^p-h-j — c p°-j % Therefore, this 

is bounded by Ct"^" 2 ) £j< P Lip^liO 3 *- 2 ^- 

The second sum of (|6.ip is bounded by C^/i(</> = where the sum is restricted to 
those t with Ylp-e+i Lipj(K) > t/2. Let h be the smallest such £, the sum is bounded by 

2q-2 

(i(<t> >h)< Ch- q < Ch- q [ E LiPjWA 



To bound the last sum, we use the inequality {Ylp-h+i x j) 2 — to obtain 



2q-2 



h-« [ e u pj( r ) 



P 



2q-4 



h- q e U M K ) 



K p-h+l 

<h -« f E L iPi(^) 
\p-/i+i 

< ^ 2 ( e Li p. w 



E Li p,(^) 
v p-/i+i 

/ V 
h E Li Pj (tf) s 



q-2 



q-2 



ELiPi(^) 5 



This concludes the proof. 



□ 



To proceed, we need an analogue of Rosenthal-Burkholder inequality for weak moments. 
Although it is not written explicitly in Burkholder's article |Bur73j . it follows easily from 
the techniques developed there, giving the following statement. 

Theorem 6.3. Let (D p ) be a sequence of reverse martingale differences with respect to a 
decreasing filtration T p (i.e., D p is T p -measurable and E{D p \T p +\) = 0). For all Q > 2, 



Q 



EA>| L ^ <C||EE(^+1) TQ/2w + C\\ S np\D p \f LQ! 



In particular, 



Q 



Q/2 



E^l LQ , W <c\\J2e(di\t p+1 ) rQ/2w + cJ2\\ d p 



Q/2 



L<3/2.' 
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Proof. By a truncation argument, it suffices to prove the result for bounded random vari- 
ables, and p G [0, P]. Define three random variables 



X = sup 

0<p<P 



k=p 



1 /2 

Y= (j^nD^p+i)) , Z=msx p \D p \. 



The inequality (21.2) in [Bur73j gives, for any < 6 < /3 — 1, 

F(X > /3t,max(Y,Z) < St) < eP(X > t), 
where e = <5 2 /(/3 — 5 — l) 2 . In particular, 

(Pt) Q F(X > < (/3t) Q P(max(Y, Z) > (ft) + {/3t) Q e¥(X > t) 
< ^5- Q ||max(Y, Z)\\% tW + /3«e ||X|| . 
Taking the supremum over i, we obtain 

\X\%,„ < P Q S~ Q \\max(Y,Z)\\%,~ + P Q e\\X\\ Q LQ , w . 
If /3 > 1 is fixed, and 5 is chosen small enough so that (3®e < 1, this yields ||X||^q ,„ < 
C||max(y,Z)||^. Since ^ D p 



< X and IIYI 



LQ>- 



Y 



2110/2 



rem. 



l q/ 2 ,wi this proves the theo- 

□ 



Proof of Theorem \6.1\ We have K — ~E(K) = ^ D p , hence 



\K-E(K)\\%-^ W <C||^E(Z) 2 |^ +1 ) " ? \ ro + C^p p 



i2g-2 
I L 2 i- 2 > 



For the first term, we use the inequality IHI^Qm < ||"||rQ- Therefore, this term is bounded 

by 

ceI Y^e{d 2 p \f p+1 ) 

Since </> has a weak moment of order q, it has a strong moment of order 2. Therefore 

gives E(Dp| J-p+i) < ^2j< p c p °lj Lipj(i^) 2 . Hence, the first term in Rosenthal-Burkholder 
inequality is bounded by 

chf^) 2 ^) <c(j2u Pj (Kf 

\ P j=0 J \ 3 

Let us now turn to ||-D p ||£2g-2 w . Integrating the estimates of Lemma 16.21 we get 
(6.2) 



in ll 2 ''- 2 

I ^p 1 1 £2(2-2, ui 



^C^ip^ 2 ^ 2 ^^ U ^ K ) 



j=p-h+l 



We should sum those estimates over p. For the first sum, we obtain 

j>p j( /o 2 - 2 £^ < c7£Li Pj( io 2 - 2 < c hu Pj{K y< 



9-1 



P>j 



For the second sum, let us define a function / on Z by f(j) = Lip ? -(iT). This function belongs 

_1 ir^p+h 

<i+l j=p- 



to ^ 2 (Z). The corresponding maximal function Mf{p) = sup fc>0 2 h+\ Yl P j=p-h fU) a ^ so 
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belongs to £ 2 (Z) and satisfies ||M/||^ 2 < C||/||^2, by Hardy-Littlewood maximal inequality. 
In particular, 

^supf^ 1 ]T U Vj (K)\ <C^Li Pi (iT) 2 . 
p h>0 \ P -h+l J j 

Therefore, the contribution of the second term in (|6,2|) is bounded by C (^ Lip^i^) 2 ) 9 . 
This concludes the proof of Theorem 16.11 □ 

Remark 6.4. In view of Theorems 15.11 and 16.11 it would seem natural to try to prove a 
weak polynomial concentration inequality in invertible systems with weak moment controls 
on the return time. We have not been able to prove such a statement. 

7. Applications 

In this section, we first give examples of dynamical systems satisfying an exponential con- 
centration inequality or only a polynomial concentration inequality. We also give examples 
of systems satisfying a weak polynomial concentration inequality. Second, we present several 
applications of these inequalities to specific observables. We shall not attempt to be exhaus- 
tive. Previous results are found in [CMS02, CCS05b, CCRV09J. For instance, we strengthen 
the bounds obtained in [CCS05bJ since for dynamical systems modeled by a uniform Young 
tower with exponential tails, we can now use an exponential concentration inequality in- 
stead of a polynomial concentration inequality with moment 2 as in [CCS05bJ. For systems 
modeled by a non-uniform Young tower, only a polynomial concentration inequality with 
moment 2 was known for intermittent maps of the interval (under some restrictions on the 
parameter). We now have at our disposal an optimal polynomial concentration inequality 
for these maps, and more generally, for dynamical systems modeled by non- uniform Young 
towers with polynomial tails. 

7.1. Examples of dynamical systems. There are well-known dynamical systems (X, T) 
which can be modeled by a uniform Young tower with exponential tails [You98j. Examples 
of invertible dynamical systems fitting this framework are for instance Axiom A attractors, 
Henon attractors for Benedicks-Carleson parameters fBYOOj . piecewise hyperbolic maps like 
the Lozi attractor, some billiards with convex scatterers, etc. Such systems admit an SRB 
measure \x and there is an invertible uniform Young tower (Az,Tz,/tz) and a projection 
map 7T : A% — > X such that T o it = it o T% and \i = fiz ° tt 1 - In the non-invertible case, 
there is a non-invertible Young tower (A,T, ft) and a corresponding projection map. A 
non-invertible example is the quadratic family for Benedicks-Carleson parameters. In both 
cases, it can also be ensured that the projection map is contracting, i.e., d(nx, iry) < dg(x, y) 
for every x,y in the same partition element. Here, dp denotes the (unilateral or bilateral) 
symbolic distance in the tower given by dg(x,y) = f3 s ( x ^ for some /3 < 1. In particular, 
if / is a bounded Lipschitz function on X, it lifts to a function / o ir which is Lipschitz in 
the tower. More generally, if / is Holder continuous, then its lift is Lipschitz for dp if j3 is 
close enough to 1. Therefore, all the results we proved in the previous sections for Lipschitz 
observables K have a counterpart about Holder ones, we will not give further details in this 
direction and restrict to the Lipschitz situation for ease of exposition. We will also assume 
for simplicity that X is bounded. 

Theorem 7.1. Let (X,T) be a dynamical system modeled by a uniform Young tower with 
exponential tails and let /x be its SRB measure. There exists C > such that, for any n E N, 
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for any separately Lipschitz function K(xq, . . . , x n -\), 

( 7-1 ) J e K(x,Tx,...,T"-^x) d ^ < e J •K(x,...O n - 1 x)d^x) e CY^U Vs (IC) 2 . 

This theorem is an obvious consequence of Theorem 13.41 in the invertible case and of 
Theorem 13.11 in the non-invertible case. Inequality (|7.ip was previously known only for 
uniformly piecewise expanding maps of the interval and subshifts of finite type equipped 
with a Gibbs measure [CMS02]. Under the assumptions of the previous theorem, only a 
polynomial concentration with moment 2 had been proven |CCS05aJ. 

An immediate consequence of (JT7TJ) is the following inequality for upper deviations: for 
all t > and for all n G N 

(7.2) fiLeX : K(x, Tx,..., T n ~ l x) - f K(y, T^y) d»(y) > t\ 



The same bound holds for lower deviations by applying (|7.2p to —K. 

Let us now consider dynamical systems modeled by a non-uniform Young tower with 
polynomial tails. In the invertible case, there is an invertible non-uniform Young tower 
(Az,iz,/tz) an d a projection map it : A% — > X, and the SRB measure is \x = fiz n 
provided that ^ (j}(a)p,z(A a ,o) < °°- If Yl ( P( a ) q fizi^-afl) < oo, we shall simply say that 
the tower has L q tails. Similarly, if Y2(f>{ a )>n Az(A aj o) < Cn~ q , we shall say that the tower 
has weak L q tails. We can of course rephrase what we have just said in the non-invertible 
case. 

Theorem 7.2. Let (X,T) be a dynamical system modeled by a non-uniform Young tower 
with L q tails, for some q > 2. Then T satisfies a polynomial concentration inequality with 
moment 2q — 2, i.e., there exists a constant C > such that, for any n £ N, for any 
separately Lipschitz function K(xq, . . . , x n -\), 

/ , \ 9-1 

2q-2 

dfi(x) < C 




J K(x, T n - X x) - J K(y, T n ~ l y) d(i(y) 

Using Markov's inequality we get at once that, for any t > and for any n € N, 
(7.3) Jiel: \K{x,Tx,...,T n - l x)- f K(y, . . . ,T"- 1 y) dfi(y)\ > t\ 



< C 



E^IWK) 2 



9-1 



t 2q-2 

If the tails are only in weak L q , Theorem 16.11 shows that (|7.3p still holds. 

The fundamental example is an expanding map of the interval with an indifferent fixed 
point [You99j. For the sake of definiteness, we consider for a S (0, 1) the so-called "inter- 
mittent" map T : [0, 1] -> [0, 1] defined by 



(7.4) T(x) 




if < x < 1/2, 
if 1/2 < x < 1. 



There is a unique absolutely continuous invariant probability measure d//(x) = h(x) dx such 
that h(x) ~ x~ a as x — > 0. This map is modeled by a non-uniform Young tower (A, ji) such 
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that £i{(j) = n} ~ C/na +1 . The return time has a weak moment of order 1/a. Thus, for 
a £ (0, 1/2), the previous results yield: 

Corollary 7.3. Let T be the map (|7.4p and its absolutely continuous invariant probability 
measure. There exists a constant C > such that, for any n G N, for any separately 
Lipschitz function K{xq, . . . , x n _i), 

fx^xeX : \K{x,Tx,...,T n - l x)- J K(y, . . . , T n - l y) dn(y)\ > t\ 



< C 



This estimate readily gives bounds for the moments of order q ^ 2/ a — 2. Indeed, if 
Z is a random variable satisfying f(\Z\ > t) < (A/t)®, then using the formula Ed-Z^ 9 ) = 
J (7t 9_1 P(|Z| > t) dt and the tail estimates, one gets 

E(\Z\*) < -r^—A* for q<Q, 
Q-q 



and if Z is bounded 



H\Z\ q ) < — ^ A Q \\Z\\l- x Q for q>Q. 
q v 



For q < 2/ a — 2, this generalizes to arbitrary separately Lipschitz functions of n variables 
the moment bounds obtained for ergodic sums of Lipschitz functions in |MN08] (while the 
moment bounds for q > 2/ a — 2 are apparently new, even for ergodic sums). On the other 
hand, we improve the result in [CCRV09] in two respects: first, we obtain a polynomial 
concentration inequality with moment 2 for any a E (0, 1/2) instead of (0, 4 — \/T5); second, 
we also obtain a polynomial concentration inequality with a moment whose order is larger 
than 2 and depends on a S (0, 1/2). 

Remark 7.4. There is a difference between Theorems 14.11 (about strong moments) and l6.ll 
(about weak moments): in the former, the range of parameters is q > 2, while we require 
q > 2 in the latter. It turns out that Theorem 16.11 is false for q = 2, as testified by the 
map (|7.4p with a = 1/2. For such a map, if / is a Holder function with j f dfj, = 
and /(0) ^ 0, then S n f ' j ' \Jn logn converges in distribution to a gaussian |Gou04a|, Page 
88]. If Theorem 16.11 were true for q = 2, we would have fi{\S n f \ > t} < Ct~ 2 n, hence 
fj,{\S n f j \Jn logn| > t} < Ct~ 2 (n log n)~ l n — > 0, implying that S n f j \Jn log n tends in 
probability to and giving a contradiction. 

There are also invertible examples exhibiting an intermittent behavior, notably coming 
from billiards. Indeed, apart from the stadium billiard (with a weak moment of order 2 and 
therefore not covered by our results), Chernov and Zhang studied in [CZ05a, CZ05bJ sev- 
eral classes of billiards for which the decay of correlations behaves like 0((log n) c /n 1 ^ ^ 1 ), 
for some parameter a that can be chosen freely in (0, 1/2] and some C > 0. This decay 
rate is obtained by modeling those billiards by nonuniform invertible Young towers with 
well controlled tails. Therefore, we can apply Theorem 17.21 to those maps, yielding poly- 
nomial concentration inequalities for any exponent p < 2/ a — 2, just like in the above 
one-dimensional non-invertible situation. 
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7.2. Empirical covariance. For a Lipschitz observable / such that J fdfi = 0, the auto- 
covariance of the process {/ o T k } is defined as usual by 

(7.5) C f (£) = j f.foTt&n. 

An obvious estimator for Cf(£) is 

1 71—1 

C f (n,£,x) = -J2f(T j x)f(T j+e x). 

71 3=0 

We could as well consider the covariance between {foT h } and {goT k }, for a pair of Lipschitz 
observables /, g. For each I > 0, the ergodic theorem tells us that Cf(n,£,x) — > Cf(£) \i- 
almost surely, as n — > oo. Considering the function of n + £ variables K(xq, . . . , x n+ £_i) = 
^Y^=o f( x j)f(. x j+e)i we obtain immediately (noting that f Cf(n,£,x)dfi(x) = Cf(£)) the 
following theorems. 

Theorem 7.5. Let (X, T) be a dynamical system modeled by a uniform Young tower with 
exponential tails and fj, its SRB measure. Let f : X — > R be a Lipschitz function. There 
exists a constant c > such that, for any n, £ £ N and for any t > 0, 



H |a; G X : \Cf(n,£,x) — C f (£)\ >t} < 2e 



.A! 

c -+e 



Theorem 7.6. Let (X, T) be a dynamical system modeled by a non-uniform Young tower 
with weak L q tails, for some q > 2, and fj, its SRB measure. Let f : X — >■ R be a Lipschitz 
function. There exists a constant c > such that, for any n, £ G N and for any t > 0, 

'n + £\<i-i 1 



fi{x e X : \C f (n,£,x) -C f {£)\ > i} < 



2 J t 2 i~ 2 ' 



7.3. Empirical measure. Given x € X in an ergodic compact dynamical system [X, T, //), 
let 

^ n— 1 



3=0 



be the associated empirical measure. By Birkhoff's ergodic theorem, S n (x) vaguely converges 
to jU, for ^-almost every x. Our aim is to quantify the 'speed' at which this convergence 
takes place. We use the Kantorovich distance (compatible with vague convergence): for two 
probability measures Hi,H2 on X, let 

= sup{/ 9 d m -] gAm : , : X -> R is 1-Lipschi tz } . 

Set 

P n (x) = dist K (S n (x),n). 
We have the following general bounds. 

Theorem 7.7. Let (X,T) be a dynamical system modeled by a uniform Young tower with 
exponential tails and fx its SRB measure. Let f : X — > R be a Lipschitz function with 
f /d/i = 0. There exists a constant C > such that, for any n € N and for any t > 0, 



V n {x) -J V n {y)dn(y) 



V 7 ™ 



OPTIMAL CONCENTRATION INEQUALITIES FOR DYNAMICAL SYSTEMS 



29 



Theorem 7.8. Let (X,T) be a dynamical system modeled by a non-uniform Young tower 
with weak L q tails, for some q > 2, and \x its SRB measure. Let f : X — > R be a Lipschitz 
function with J f dfi = 0. There exists a constant C > such that, for all n G N and all 
t>0, 



\i \ x G X 



V n {x) - / V n (y)d^(y) 



t 1 C 



These bounds follow at once by applying either (|7.2p or (|7.3p to the function 



K(x , . . . ,x n _i) = sup < 



1 71-1 f | 

— ^(^j) — / 9 dfi : g : X — 7-Risl — Lipschitz > 

n j=o J ' J 



whose Lipschitz constants are uniformly bounded by 1/n. The natural next step is to seek 
for an upper bound for J T> n (y) d/i(y). We are not able to obtain an a priori sufficiently 
good estimate unless we restrict to one-dimensional systems. 

Corollary 7.9. Let (X, T) be a one- dimensional dynamical system satisfying the assump- 
tions of Theorem 1 7. 71 There exist some constants B, C > such that, for any n 6 N and 
for any t > 0, 

M{xGX:P n (x)>-^ + -^}<e-^ 2 . 

Corollary 7.10. Let (X,T) be a one- dimensional dynamical system satisfying the assump- 
tions of Theorem 1 7, {ft There exist some constants B, C > such that, for any n G N and 
for any t > 0, 

t B 1 C 



^x£X :V n (x)>- TJ - 2+ - Tr ^< t2q _ 2 . 

These two corollaries follow immediately if we can prove that there exists B > such 
that, for any n G N, 

dfi < 



/' 



nV4' 

The proof is found in |CCS05b] Theorem 5.2]. The point is that in dimension one, there is 
a special representation of Kantorovich distance in terms of the distribution functions. The 
estimate then follows easily using the fact that the auto-covariance of Lipschitz observables 
is summable under the above assumptions. 

For the map (|7.4p . we can use Corollary 17.31 to get the bound 

f . . t B ) C 

fi \x G X : V n {x) >^ + ^rA< 



n l/2 n l/4 J - t 2_2' 
for any n G N and for any t > 0. 

Remark 7.11. What explains the power 1/4 of n is the fact that at some stage, one has 
to approximate a characteristic function of a set by a Lipschitz function. If one can control 
the auto-covariance of functions with bounded variation, one gets 

T> n dfi < -^L. 

n 



This is the case for uniformly piecewise expanding maps of the interval [CMS02J. This is 
also the case for the quadratic map with Benedicks-Carleson parameters |You92| . Since we 
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proved that this system satisfies an exponential concentration inequality, we get 

filxeX : V n (x) > -j=\ < e- ct \ 



for any n G N and for any t greater than some to > 0. 

7.4. Kernel density estimation. The estimation from an orbit of the density h of the 
invariant measure of a one-dimensional dynamical system (X, T) is based on the estimator 

^ n— 1 

h n (s;x) = 

na„ 1 
3=0 

where a n is a sequence of positive numbers going to but such that na n goes to oo, and 
is a 'kernel', that is, a non-negative Lipschitz function with compact support. We suppose 
that it is fixed in the sequel. 

As proved in [CCS05a, Appendix C], the density of the invariant measure for a one- 
dimensional system modeled by a uniform Young tower with exponential tails has the fol- 
lowing property: there exist some constants B > and r > such that 



(7.6) J \h(s) - h(s -t)\ds< B\t\ T , Vt G R. 

We have the following result about the L 1 convergence of empirical densities. 

Theorem 7.12. Let (X,T) be a one- dimensional dynamical system modeled by a uniform 
Young tower with exponential tails and [i its SRB measure. There exist ci, C2 > such that, 
for any t > c\{a T n + l/(^/na^)) and for any n G N 

,{, e X:/ Ms; ,)-, W |d S > f }<^. 

The proof is similar to the proof of Theorem 5.2 in [CCS05aJ except that we use an 
exponential concentration inequality instead of a polynomial concentration inequality with 
moment 2; hence we obtain a much stronger bound. (See also |CMS02|, Theorem III. 2] for 
uniformly piecewise expanding maps of the interval.) The property (|7.6p is used to obtain 
an upper bound for J \h n (s;x) — h(s) \ dsdfi. 

We do not know if the property (|7.6p holds for the density of the invariant measure of all 
one-dimensional system modeled by a non-uniform Young tower with polynomial tails. But 
for the special case of the intermittent map (|7.4p . it is easy to check that (|7.6p is true with 
r = 1 — a. Therefore, applying Corollary 17.31 we get the following result. 

Theorem 7.13. LetT be the map (|7.4p and [i its absolutely continuous invariant probability 
measure. There exist C\,C2 > such that for any t > ci(a^~ a + l/{yfnafy) and for any 
n G N 

C 2 



[i < x G X : I \h n (s; x) — h{s)\ ds > t > < ^ 

I J J na- l a%~ ~l 



-2 2 



-2 



7.5. Tracing orbit properties. Let A be a measurable subset of X such that fi(A) > 
and define for all n G N 

j n— 1 

S A (x,n) = - inf S~] d(T J x, T 3 y), 
n y&A f—' 

where d is the distance on X. This quantity, between and 1, measures how well we can 
trace the orbit of some initial condition not in A by an orbit from an element of A. 
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Theorem 7.14. Let (X,T) be a dynamical system modeled by a uniform Young tower with 
exponential tails and (i its SRB measure. There exist constants c\,C2 > such that, for any 
measurable subset A C X with fJ,(A) > 0, for any n £ N and for any t > 



Jx£X : S A (x,n)>c 1 ^^ + ^\<e-^ 2 . 

Again, the proof is the same as [CMS02, Theorem IV. 1] because it relies only on the 
exponential concentration inequality. 

Theorem 7.15. Let (X,T) be a dynamical system modeled by a non-uniform Young tower 
with weak L q tails, for some q > 2, and [i its SRB measure. There exist constants c%,C2 > 
such that, for any measurable subset A C X with fi(A) > 0, for any n £ N and for any t > 

^{ XeX : SA ^ n) > n (g-i)/(2 g -i) ( t+ ^)} ~ n fa-i)/(£-i) t 2,-2 - 

The proof follows the lines of that of [CMS02 , Theorem IV. 1] except that one uses the weak 
polynomial concentration inequality instead of the exponential concentration inequality as 
in the previous theorem. 

For the intermittent maps (|7.4p . we can use Corollary 17.31 We get that there exist 
constants C\,C2 > such that for any subset A C [0, 1] with n(A) > 0, for any n £ N and 
for any t > 

^ { X ^ [0 ' 1] : U) > nd/a-lU/a-l) { t + ^))}- n (i-l)/(t-l) t £-2- 

We now formulate similar results for the number of mismatches at a given precision. Let 
A be a measurable subset of X such that fi(A) > and e > 0. For all n £ N define 

M A (x, n,e) = ~ inf Card{0 < j < n - 1 : d(T j x, T j j) > e}. 
n yeA 

We have the following result. 

Theorem 7.16. Let (X,T) be a dynamical system modeled by a Young tower with expo- 
nential tails and \x its SRB measure. There exist constants c\,C2 > such that, if A C X 
is such that ^(A) > 0, then for any < e < 1/2, for any n £ N and for any t > 

Jx£X: M A (x,n,e) > 0^^^= + ^ j < 

Theorem 7.17. Let (X,T) be a dynamical system modeled by a non-uniform Young tower 
with weak L q tails, for some q > 2, and fi its SRB measure. There exist constants c%,C2 > 
such that, if Ad X is such that fJ,(A) > 0, then for any < e < 1/2, for any n £ N and for 
any t > 

l x\x£ X : M A (x,n,e) > , _ 1W , _ = w, 9 „_^ ( t + 



c(ff-l)/(g-l/2) n ( 5 -l)/(2?-l) ^ n(A) 



< 



e (q-l)/(q-l/2) n (q-l)/(2q-l) t 2q-2- 

Once more, the proofs are almost the same as [CMS021 Theorem IV. 2]. 
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7.6. Integrated periodogram. Let (X,T,fi) be a dynamical system and / : X — > R be 
a Lipschitz function such that J f d/j, = 0. Define the empirical integrated periodogram 
function of the process {/ o T k }k>o by 

J n (x,uj) = J -| ^e -w V(T 3 x) ds, w€[0,2tt]. 



Let 



where Cf(£) is defined in (|7,5p . 

Theorem 7.18. Let (X, T) 6e a dynamical system modeled by a uniform Young tower with 
exponential tails and \x its SRB measure. Let f : X — > R be a Lipschitz function such that 
j /d/i = 0. There exist some positive constants ci, C2 such that for any n £ N and /or any 
£ > 



/i < x € X : sup | J n (x, w) — J(oj)\ > t + 

I wG[0,27r] 



ci(1 + logn) 3 / 2 



< e 



-c 2 nt 2 /(l+logn) 2 



77 



The observable sup^ g r 0i 2^] \ Jn(x,u) — J(lo)\ was studied in [CCS05bJ in the same setting 
but using the polynomial concentration inequality with moment 2. We get here a stronger 
result since we now have the exponential concentration inequality at hand. 



Proof. Let 

(7.7) K(x ,...,x n -i) 

The reader can verify that 
(7.8) 



sup 

Je[0,27r] 



1 n— 1 

1 lE« 



7(xj) da-J(w) 



sup Lipi(ii') < 

0<€<n-l 



j=0 
c(l + logn) 



n 



for some constant c > 0. Let 



(7.9) 



Q n {x) = SUp | J n (x,U>) ~ J(u>)\ 
o;e[0,27r] 



The major task is to estimate from above J Q n dn. We partly proceed as in [CCS05b, Page 
2345]: We discretize u, that is, given any integer N £ N, we define the finite sequence of 
numbers (ui p ) by oj p = 2np/N, p = 0, . . . , N. We then define 

Q n (x) := sup | J n (x,uj p ) - J(uj p )\. 

0<p<N 

One can then show that there exists some C > such that 



(7.10) 



Qn{x) < QJx) + £ 



for all x £ X and for all integers n, iV £ N. 

We shall also use the fact (see |CCS05b] for more details) that there exists some C > 
such that, for all uj and for any n £ N, 



(7.11) 



/C 
J n (x,ui) d/i(x)| < — . 
n 
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We now depart from |CCS05bJ and use that for any real f3 > 

r _ N r N f 

(7.12) / e w » d/x < / e^ J " {x ^- J( ^ )] dfi(x) + £ / e^^-^'^ dfi(x). 

We estimate each term in the first sum of the right-hand side of this inequality by using the 
exponential concentration inequality (|7.ip . (|7,8p and ()7. 1 1 j) : 

< e C/3 2 (l+logn) 2 /n . gCP/n 

We get the same bound for each term in the second sum of the right-hand side of (|7.12p . 
hence 

C73 2 (l+logn) 2 /n , Cf3/n 



/' 



e /3Qn d M < 2(N + l)e c 
We now use Jensen's inequality (|7.10p and (|7.9p to get 

sup J n (x,o;) — J(oj)\ d/i(x) < 

w ai„ gl 2( N + l)] +C/3 (i±^ + ^ + £ 

NsN [P n n N 

It remains to optimize over N GN and /3 > to obtain 

17/ \ il \\A ( \ <r ci(l +logn) 3 / 2 
sup \J n {x,uj) — J(uj)\ dfj,(X) < 

Je[0,27r] 



We conclude the proof by applying (|7.2p to the function (|7.7|) , taking into account (|7.8|) and 
the previous estimate. □ 

Appendix A. A technical lemma 

Our goal in this section is to prove a technical result that was required to obtain poly- 
nomial concentration estimates in non-uniform invertible Young towers. Let us consider 
a non-invertible non-uniform Young tower in which the return time has a moment of or- 
der q > 2 (i.e., h q fi{x £ Aq : <p{x) = h} < oo). We define a function <3? ra by 
<J> n (x) = pCard{je[i,n]:T3xeA } f or x G ^ anc j ^ _ q otherwise, where j3 < 1 is fixed. 

The estimate we need in (|5.3|) is given in the following theorem. 

Theorem A.l. For all nonnegative real numbers L k , 

e(£w,otA ) <c(£4) H . 

• \k>r J J 

For the proof, let us expand the square on the left, the resulting function is bounded 
by ErSi>«>r L k L e $k-r ° T r since $ e _ r oT r < 1. Bounding L k L e by L\ + Lj, we get 
two terms that will be studied separately (but with very similar techniques). The theorem 
follows from the following lemmas. 
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Lemma A. 2. We have 



r k>r 

Lemma A. 3. We have 



J r + i)* fc _ r or- <c(Y, L l) 



g-1 



fc-1 



3-1 



We will prove a more general result, encompassing those two lemmas and better suited 
to induction. We will need the following notion. 

Definition A. 4. A weight system is a set of numbers u(r, k) for r < k such that 

(1) either u(r, k) = Mj, for all r < k, 

(2) or u(r, k) = (Y^ 1 Mj)/(k - r) for all r<k, 

where is a summable sequence of nonnegative real numbers. In both cases, let 12 = E^ 
be the sum of the weight system. 

Weight systems satisfy the following property. 

Lemma A. 5. Let u(r, k) be a weight system. For all m > 0, we have E^ r u(r, r + m) < E. 

Proof. If u(r,k) = M k , then £>(r,r + m) = £ M r+m < £ M r = S. If u(r,k) = 
(E}^-)/(*-r),then 

m— 1 m— 1 

^ u(r, r + m) = m" 1 ^ ^ M r+j < mT x ^ E = E. □ 

r j=0 j=o 

We will also need the following fact. 

Lemma A. 6. Ze£ u(r, fc) 6e a weight system with sum E, and Ze£ fee a sequence with a 
moment of order 1. There exists a weight system v(r,k) with sum at most CE such that, 
for all s < k, we have ^2 r<s u{r, fc) c s-r < v(s, k). 

Proof. Let w(s, k) = ^2 r<s u(r, k)c^ r . If u(r, k) is of the first type (i.e., u(r, k) = M]S), then 

w(s, k) = J2 r <s -^fe c i-r — CMk, and one can take v(s, k) = If u(r, k) is of the second 

fc-i 



type (i.e., u(r, k) = (£" 1 Mj)/(k - r)), then 



fc-1 



r<s r<s \j=r 



(1) 

r 



c s— r 



£ F b|E^E4VX>E- <1) 

\j<s r<j j=s r<s 

sA(e^VcE m 



jr'<S jr'=S 



OPTIMAL CONCENTRATION INEQUALITIES FOR DYNAMICAL SYSTEMS 



35 



Let M' s = CM S + J2 j<s Mjcflj, we get w(s, k) < ^- S {M' S + 0^=1+1 Mj), which is bounded 

by fcri X^j=l Moreover, YlMj < C^Mj since the sequence ch ^ is summable. This 
shows that w is bounded by a weight system v with sum at most CE. □ 

The main lemma is the following: 

Lemma A. 7. Consider a weight system u(r,k), and real numbers 7 > 1 and Q > 1 with 
7Q < q — 1. We have 

\Q 

J2u{r,k)(k -ry<<S> k _ r oT r J < CE Q . 

\fc>r / 

This result implies Lemmas IA.2I and IA,3| using it with 7 = 1, Q = q — 1 and the weights 
L 2 for the former, (X^=r Lj)/(k — r) for the latter. 

We will prove the lemma directly for C - £ [1, 2] , while an induction will be required for 
Q > 2. When u is a weight system, let us write S("f, u) = J2k>r u ( r > k)(k — r)^<$>k-r T r . 
We will construct another weight system v(r, k) (with sum at most CE) such that 

15(7, u)\ Q < CE Q + CE Q / 2 J |5(2 7 , «)| Q / 2 . 

By induction, the last integral is bounded by CE^/ 2 , and we obtain the desired result. 

Let us explain the strategy of the proof. First, since f <3? n < Cn by Lemma lA.81 below, 
we have 

E(5( 7 , u)) < Y,( k - r ) 7n ( r > k )i q -r ] = Yl m7c m _1) ( Yl u ( r > r + m n <J2 m7c m~ 1)s > 

k>r m \ r J m 

by Lemma lA.51 As 7 < jQ < q — 1, the sum in m is finite, and we get E,(S(j, u)) < CE. 
Consequently, to prove the lemma, it suffices to bound f \S(^,u) — E(iS'(7, u ))\ ■ 

We decompose S = 5(7, u) as E(5) + X^ s >o ^s°T s , where S s o T s is a sequence of reverse 
martingale differences: writing .Fq for the Borel cr-algebra and J- s = T~ s J t q, the function 
S s F s is J-~ s -measurable and E,(S S o T s \J r s+ i) = 0, i.e., E(S , s |J 7 i) = 0. For any function /, 
one has E(f\J r s ) = (C s f) o T s , where C is the transfer operator. Therefore, S s is given by 
S s (z) = C s S(z) -C s+1 S(Tz). 

For Q € [1,2], the von Bahr-Esseen inequality jvBE65j yields 



J \S-E(S)\ Q < 5^E(|5 S 



while for Q > 2 Rosenthal-Burkholder inequality gives an additional term as follows: 

j \S- E(S)| Q < E ^E(5 2 |J"i) o + ^E(|S S | Q ). 

We will split each function S s into several parts that will be estimated separately. Plug- 
ging those bounds into the inequalities of von Bahr-Esseen (for Q £ [1,2]) and Rosenthal- 
Burkholder (for Q > 2) will give the desired result. 

More precisely, if h(x) 7^ 0, we have EdS'sUFi) = at the (unique) preimage of x and 
there is nothing to estimate. On the other hand, if h(x) = and if z is a preimage of x 
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under T, we have 

S s (z) = C s S(z) - C s+1 S(x) = J2(k - ry U (r, k)(£ s (^ k _ r o T r )(z) - £ s+1 ($ k - r o T r )(x)). 

k>r 

When estimating E(S' s 2 |J r i) or E(\S s \ Q \Ti), there is a contribution coming from C s+1 S(x) 
(involving a sum over k > r), and a contribution coming from the sum over the preimages 
z of x of C s S{z) (involving a sum over z and over k > r). We will treat separately those 
contributions depending on the positions of k and r with respect to s and to s — h (where 
h is the height of the preimage z of x one is considering). Let ttz be the projection of z 
in the basis of the tower. If h < s, we have C s S{z) = C s ~ h S{itz). (This is the interesting 
case: if h > s, then all the following estimates become easier, we will not indicate the trivial 
modifications to be done in this case.) 

We will study separately the following cases: 

(1) k > r > s + 1, contribution of £ s - h S{irz) - C s+1 S(x); 

(2) k > s + 1 > r, contribution solely of C s+1 S(x); 

(3) k > s — h, min(s + 1, k) > r, contribution solely of £ s ~ h S(irz); 

(4) s + 1 > k > s — h, r < k, contribution solely of C s+1 S(x); 

(5) s - h > k > r, contribution of C s ~ h S(7:z) - C s+1 S(x). 

We will treat separately those five contributions, and see that all of them satisfy the desired 
bounds. We will need very precise estimates on the transfer operator, given in the following 
lemma. We recall that the notation indicates a non-increasing sequence with a moment 
of order Q. 

Lemma A. 8. We have f $ m < c^ -1 '. For h(z) = 0, we have C n ® m (z) < c^<I> m _ n (z) if 
n < m, and 



C n {n m ){z)-Y,<b,m) 

b<n 



— u n-b c b+m-i c i 



{q-2) ST » » 
b=0 i=0 

(q) M 



where the scalar e(b, m) only depends on b and m and is bounded by YliLo c b+m-i C ' 

The function <& m involves m iterates of the transformation. While the transfer operator 
is eliminating some number n < m of those iterates, the improvement in the estimates 
depends on n, and m — n iterates remain ready to be used (under the form of & m _ n ). 
Once all the variables are eliminated, £ n (£ m $ m ) converges to the integral of $ m (which is 
equal to Ylb>o e (^> m ))> with a more complicated error term whose precise form will play an 
important role later on. 

Proof. Let us first assume n < m. In this case, £ n & m (z) = & m - n (z) ■ U n l(z), where the 

operator U n was introduced in the proof of Lemma 14.41 We proved there that ^ &n \ 

the desired estimate follows. 

For any point x with height i £ [0, m], we obtain £ m & m (x) = £ m-i< 3? m (7T2;) < c^_ i . On 
the other hand, if h(x) = i > m, we have £ m <& m (x) = § m {T~ m x) = 0, since <3? m vanishes 
on points with positive height by definition. Let V = £ m & m . 

We obtain 



i=0 -- n 
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Let us now study £ n (£ m & m ) = C n T, using the previous information regarding T. We 
will use the operators T k and B b that were introduced in Subsection 14, 2| so that C n T(z) = 
J2k+b=nTkBbF(z) for h(z) = 0. We explained there that T k = 11 + Ek where 11/ = (J f)l/\ , 



and llEf-ll < d^f 2 \ Hence, 



C n T(z) = U-Y,B b T+ E k B b T(z). 



b<n k+b=n 

We estimate first ||S;,r||. We have B b T(x) = X]g^(y)r(y), where we sum over the points 
y G T~ b (x) not returning to Ao before time b. If h(y) = i, the point Try has a return time 

to the basis equal to b + i. Therefore, \B b T(x)\ < Y^o^b+fm-i = YT=o c b+ m -i c i^ ( in view 
of the bound on T at height i). The Lipschitz norm of B b T is estimated in the same way. 
Thus, 

m 

E nwii< E 4 9 - 2) E4t-^. 

k+b=n k+b=n i=0 

Finally, the statement of the lemma is satisfied letting e(b,m) = J B b T = H{B b T). This 
scalar is independent of n and bounded by ^^o c i+m-j c ^' '— ' 

We will use the following simple remark. For k > 2 and x, y > 0, we have (x + y) K < 
x K + Cy(x + y) K ~ l (by Taylor's formula). By induction, this implies 

/ n \ n n I i \ 

(A.l) " ' 




A.l. The case k > r > s + 1. When k > r > s + 1, we have £ s+1 (<3? fc _ r o T r )(x) = 
$ fc _ r o T r - s - 1 (a;), while C s - h {<5> k ^ r o T r )(7rz) = $ fe _ r o T r - s+/l (vrz). Since T h+l (-Kz) = x, 
those terms coincide, and their contribution to S s (z) vanishes. 

A. 2. The case k > s + 1 > r, contribution of C s+1 S(x). The contribution from $j._ r oT r 
satisfies 

by Lemma IA.8I Summing those contributions to S s (z) (for varying k and r) gives a term 
which is bounded by 

fc>s+l>r 

Let us note that this term does not depend on z. Since k — r = {k — s — 1) + (s + 1 — r) < 
2(k — s — l)(s + 1 — r) and since (s + l — r) 7 c^ 1 _ r < c s q + - i y } r , we have 

s i 2) < E E^^te-r^-^-i) 7 ^-,-!^)- 

fc>s+l r<s 

By Lemma [A.6l there exists a new weight system u such that ^2 r<s u(r, k)cf + ±2 r < v (s+l, k), 

(2) ~ 

yielding S?> < £ fe>s+1 < s + " s ~ V J ®k-s _i(x). Moreover, the sum of the weight 

v is at most C£. 
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Let k > 1, we estimate \Ss (z)\ K . We apply the inequality (|A.ip to Xk = v(s + l,k)(k 
s - l) 7 $ fc _ s _i, yielding 



\SW\ K < Y «(s + l,*:)(A:-s-l) 7 $fe_ s -i- 2 + 1 

fc>s+l \s+K^<fc 



ft-1 

|7 



We claim that the last sum is bounded by C(k — s — 1) 7 £. Indeed, if the weight v is of 
the first type (i.e., v(r,£) = Me), then we bound (I — s — l) 7 by (k — s — l) 7 , to obtain 
(k — s — l) 7 Y^i= s +2 < C(k — s — 1) 7 X. On the other hand, if v is of the second type (i.e., 
v(r, £) = (5^i=r Mj)/(£ — r)), then the sum is bounded by 

k i-l k-l 
Y E -^'^ — s — 1) 7_1 < (fc — s — l) 7 ^ 1 E M 3 (k-j) 

£=s+2j=s+l j=s+l 

k-l 

<(k-s-iy Yj Mj<(k-S- 1) 7 £. 

3=8+1 

We have proved that, for all k > 1, 

(A.2) |5fr<C Y v^ + hk^k-s-ir^s^- 1 . 

k>s+l 

(2) 

Let us now assume that Q € [1,2], and let us consider the contribution of Ss to von 
Bahr-Esseen inequality. It is given by 

Yn\si 2) \ Q ) = Y E ( E (\ s ^\ Q \ jr ^ ^E c E «(*+i,*)(*-s-i) 07 E(**-.-i)s <3 - 1 ) 

s s s k>s+l 

by (|A.2p . Since E(<l>k-.s-l) < ci' this can be written (letting k = s + 1 + m) as 

Yfl^ 1 m^^Cm ^ v(s + l,s + l + m). For fixed m, the sum ^ s u(s + 1, s + 1 + m) is 

bounded by CS by Lemma lA.51 As Q 7 < q — 1, mP^Cm ^ is summable, and we obtain a 
bound CXy as desired. 

Assume now Q > 2. In this case, the second term in the Rosenthal-Burkholder inequality 
is bounded by CTS as above. Using (|A.2p (with k = 2), the first term is at most 

\ Q/2 



(\ v/^ 
E E «(« + !,*)(*-»- l) 27 ^fc-^i o T s+1 • S J = CS«/ 2 / 
s k>s+l J ^ 



|5(2 7 ,^)| ( 3/ 2 . 



Since 7' = 27 and Q' = Q/2 satisfy j'Q' < q — 1, we can argue by induction to show that 
this term is again bounded by TH. 

A. 3. The case k > s — h, min(s + 1, Ar) > r, contribution of £ s ~ h S{irz). We should 

Study Sf\z) = C s - h (Ek>s-h £r<nun(.,*-l) «fr *)(* " r) 7 **-r 

If A; > s — h and r G (s — /i,s] with r < k, we have £ s ~ h (&k-r ° T r )(irz) = &k-r ° 
T r ~( s ~ h \TTz). Since the point T r ~( s ~ h \irz) has positive height, the function ^/t-r vanishes 
here. Therefore, we only have to consider the contribution of k > s — h > r. This is 
exactly the same thing as in the previous subsection, but for the point ttz instead of x. The 
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inequality (|A.2p gives, for all K > 1, 

\SP(z)\ K <C E v(s-h,k)(k-s + h) K ^ k - s+h (nz)E K -\ 

k>s~h 

where v is a weight system with sum at most CS. For k £ (s — h, s + 1], we simply bound 
^k-s+h(^z) by 1, while for k > s + 1 we bound it by <3?fc_ s _i(x), since T h+1 (7T2;) = x. 
Summing over the preimages z of x, we get 



_ s+1 

E(|S( 3) n-Fi) < CS^^c^l v{s-h,k){k-s + h) K ~< 



Si) 

h>0 \k=s-h+l 



+ e v(s - h,k)(k - s + hr^k^ix)) 

k>s+l ) 



In the first sum, we bound k — s + h by h + 1 and we use the inequality (/i + l) K7 c^ < c£ . 

In the second sum, we have c^\k — s + /i) K7 < cl 9 — s — 1) K7 by the same argument. 

If K7 < q — 1, the quantity ^ h>0 Cu K ^v{s — h, k) is bounded by w(s + 1, k) where w is a, 
weight system with sum at most CS, by Lemma lA.61 We obtain 

/ s+1 

„0?- K 7)„ 



(A.3) 



. h>0k=s-h+l 



fc>s+l / 



The second term is identical to the term appearing in the previous subsection, in (|A,2p . It 
follows in the same way that its contribution to the inequalities of von Bahr-Esseen (case 
Q G [1,2]) and Rosenthal-Burkholder (case Q > 2) is bounded by CTP . 

Let us consider the first term, first in von Bahr-Esseen inequality (case Q £ [1,2]). Thanks 
to (|A.3|) (with k = Q), its contribution is given by 



s+1 h+1 



E CS ° -1 E E ct Q ^s-h,k) = c^^r Q " } EE v ( s - h ^- h + m ) 

s h>0k=s-h+l h>0 m=l s 



h+1 

Si-Qi) \" v rr<3 V „(«-Q7-l) 

-h 

h>0 m=l h>0 



< c^- 1 e 4 9 " Q7) E s = ceQ E c i 



where we used Lemma I A. 5 1 for the inequality. Since Qj < q — 1, this is bounded by CTP . 

When Q > 2, we use the Rosenthal-Burkholder inequality. As above, the last term in 
this inequality is bounded by CTP . Using (|A.3P (with k = 2), the first term is bounded by 

E^E E <T 27) ^-m) 

h>0 k=s-h+l 

The same computation as above shows that this is bounded by (CE 2 )*^ 2 . 
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A. 4. The case s+l>k>s— h, r < k, contribution of C s+1 S(x). The contribution 
coming from $fc_ r o T r satisfies 

£ s+1 ($k-r o T r ) = £ s+1 - fc £ fe - r $ fc _ r , 
which is controlled by Lemma lA.81 Summing over k € [s — h + 1, s + 1] and r < k, we obtain 
that the resulting contribution si^ is bounded by 

t=s-)i+lr<fc \6<s+l-fci=0 



(g-2) „(«) „(«) 



+ XT ^s+l-k-b E ^b+k-r-iH 
b<s+l-k i=0 / 

Since d^ + ^ k b is bounded, the second term is bounded by the first one. Since fc— r < (6+/c— 
r— we have fc— r < (b+k—r—i+l)(i+l), yielding (£:-r) 7 4+ fc _ r _jC-^ < 4^.^ r _jC> 9_7 ^. 



For k > 1, we obtain (letting m = k — r) 



s+l 

b+m—i 

h>0 \k=a-h+l b<s+l-k i>0 m>i 



Summing over s and using the inequality ^2xf < (^2xi) K , we get 

E^i 4) ri^)o^<E4 9) (E E E E c ^ 7) E^-^)4r^ 

s h>0 \ s k=s-h+l b<s+l-k i>0 m>i 

We reorganize the sums as follows. First, we write s + 1 = k + a for some a 6 [0, /i], so that 
the first three sums are replaced by ^a=o Sfc S&<a- Then, we move the sum over k to the 
end: since u{k — m,k) < £ for all m by Lemma IA.51 we get a bound 



/i>0 \a=0 Ko i>0 m>« 



The sum over m > i is bounded by 7 . The (finite) quantity ^2 i>0 cf ^ can be 
factorized out, giving a multiplicative constant. Since the sum ^j,<o^b 7 ^ * s um f° rm ly 



bounded, we get an upper bound E K Y.h>o( h + l T c< h ^ C ' S ' t ' when K ^ ?• 



This readily implies that the contributions of to the inequalities of von Bahr-Esseen 
(case 1 < Q < 2) and Rosenthal-Burkholder (case Q > 2) are bounded by S^, as desired. 

A. 5. The case s — h > k > r. The contribution coming from ^fc-r o T r reads 
£- fc (* fc _ r o T r )(vrz) - £ s+1 (*fc-r ° ^)(x) = £*- fc - fc £ fc - r * fc _ r (7r*) - £ s+1 - fe £ fc - r 1> fe _ r (x). 

To estimate those contributions, we use Lemma IA.8I The main terms e(b, k — r) simplify 
partially: only those corresponding to s — h — k<b<s + l — k remain. As a consequence, 
the global contribution Ss (z) is bounded by 

j ^ g iy \ 

E E 41-,-^ + E E ^r-A q) ■ 

b=s-h-k+l i=0 b=0 i=0 / 
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Let us first note that (k — r) 7 c^ fc _ r _ i c^ < c^ +j j}_ r _ i cf ^ as in the previous subsection. 

(5 1) (5 2) 

We will then handle separately the two pieces Sf ' (z) and Sf ' 0) of this expression. 

Summing over h and then over s, and using the inequality Yl x i — Cl2 x i) K as m the 
previous subsection, we get 

s h>0 \ s k<s-hb=s-h-k+l i>0 m>i 

Let us reorganize the sums essentially as in the previous subsection. First, let s+l — h = k+a 
for some a > 1, so that the first sums become X^a>i EfcEbia- Then, we move the sum 
over k to the end, and we use the inequality Y2k u (k ~ m ' k) < E for all to. This yields a 
bound 

SK E4 9) (EEE c S 9 - 7) E4r£ 

h>0 \a>l b=a i>0 m>i 

The last sum over to, is bounded by d[ 9 7 , which is independent of i. Therefore, we may 

factorize out the sum over i, since ^2ncf ^ < oo. Since d[ 9 7 ^ is nonincreasing, we have 

Ylt=a 7 ^ — {h+l)d^a 7 • As g— 7— 1 > 0, the sequence di 9 7 ^ is summable, giving 

yet another multiplicative constant. We obtain a bound CS K X^h>o(^+ ^) K(: h — CT, K when 
k < q. 

(5 2) 

Let us now study S$ ' (z). We have 

^E(|,si 5 - 2) n-Fi)or s 

s 

* E i 9) (e E E <fc 2 V 6 E c ?~ 7) E «(* - k)c 



s—h—k 

(9-7) 

h>0 \ s k<s-h 6=0 i>0 m>i 



We proceed exactly as above, with the difference that the sum over b goes from to a — 1. 
We get a bound 

a-l 



^ K E4 9) EE4 9 -i 2 V4 



7(9-7-1) 

h>0 \a>lb=0 



Since q — 7 — 1 < g — 2, the convolution between d^_^\ and d^ 7 X ^ is bounded by c^_^ . 
As 7 + 1 < q, the sum over a is finite, and we obtain a bound S K . 

Gluing the two pieces together, we have shown that E s E (l^ 5) Tl-^i) ° T s < CT, K for 

(5) 

all K < g. This readily implies that the contributions of Sg to the inequalities of von 
Bahr-Esseen (case 1 < Q < 2) and Rosenthal-Burkholder (case Q > 2) are bounded by E^, 
as desired. 
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